This PR should workaround the issues of using counters when their values do not reset at the same time.
By forcefully reset all other possible statuses for a given set of repo/label to 0, we forcefully reset all of them at the same time, avoiding delayed resets that cause huge spikes.
The current value for jobs in a given status is job{status=OBSERVED_STATUS} - job{status=NEXT_STATUS}
This PR should workaround the issues of using counters when their values do not reset at the same time. By forcefully reset all other possible statuses for a given set of repo/label to 0, we forcefully reset all of them at the same time, avoiding delayed resets that cause huge spikes.
The current value for jobs in a given status is
job{status=OBSERVED_STATUS} - job{status=NEXT_STATUS}
before
T0:
(actual jobs in
queued
state is1001 - 1000 = 1
)T1 (after the collector restarts and an
in_progress
event is received):At this point, prometheus will have the current state internally:
so until another
queued
event is received, and the resetted data point reported, we would have erroneously1000
currently queued jobs.after
T0:
(actual jobs in
queued
state is1001 - 1000 = 1
)T1 (after the collector restarts and an
in_progress
event is received):At this point, prometheus will have the current state internally: