namedprocess_namegroup_context_switches_total counter is decreasing

I'm also seeing this issue, I assume it's because the exporter is doing a straight sum() of all the matching processes without any history.

For example, let's assume we have a process that accepts network connections. The main process spawns 2 sub-processes. Each subprocess will handle 1000 requests and then terminate itself, causing the main process to spawn new processes to replace it.

In the beginning you might have 3 PIDs: 10, 20, 30. At a time, T0, they all start at 0 context switches.

@ T1
PID 10 - 100 switches
PID 20 -  10 switches
PID 30 -  10 switches
SUM    = 120 switches

@ T2
PID 10 -  150 switches
PID 20 - 1000 switches
PID 30 - 2000 switches
SUM    = 3150 switches
...etc.

Now, what happens when one of the processes die and is replaced?

@ TN
PID 10 -  160 switches
PID 20 - 1200 switches
PID 40 - 0 switches
SUM    = 1180 switches

Oops...the number of context switches went down!

This has produced an interesting result for us, where it looks like the context switching is constantly accelerating for our long-running processes, since PID 10 constantly increasing and the rate() function in Prometheus thinks that it's resetting all the time.

I'm not sure how this should be solved, however - adding the PID would generate high-cardinality.

ncabatoff / process-exporter

namedprocess_namegroup_context_switches_total counter is decreasing #193