Open mkuratczyk opened 4 days ago
I'm not sure if that's because I'm testing a slightly different scenario perhaps these errors are caused by some recent changes but I haven't seen them in the past at all and now I see them very often.
We have seen these before, usually on a freshly booted node when some values are not yet available. ''
is used as undefined
historically by a few metrics that come from the environment, e.g. free disk space, which is not computed instantly the node starts booting.
The Prometheus plugin should filter such values out because if a data point isn't available yet, what else can it do?
I've seen this enough times now to say that it's certainly not about a freshly booted node, but about a queue getting deleted.
@mkuratczyk yup, that can be another case where some metrics no longer exist. During a boot, they do not yet exist, and after queue deletion, they no longer exist.
I am all for making the code more defensive but if some samples are missing… what other than an error can the Prometheus scraping API endpoint return?
Describe the bug
I don't have the exact repro steps. I think this is a race condition when a queue is deleted while the metrics are collected and the queue is found by the collector but then returns empty values (
''
) instead of the expected numbers. I've seen this occur in 3 different places:Observed on the
main
branch on November 26Reproduction steps
Not clear
Expected behavior
No crashes
Additional context
No response