Open charleskorn opened 2 months ago
We should not change this behavior in Prometheus. The error you're encountering when multiple series have identical labels after applying functions like max_over_time is intentional. It serves as a useful alert to potential misconfigurations or labeling issues in your metrics.
In real-world scenarios, scrapes are not perfectly aligned, if we fix this, such label conflicts are unlikely unless there's an actual problem. By erroring out, Prometheus helps identify and fix issues that could compromise the accuracy of the monitoring data.
Therefore, it's important to let Prometheus continue raising this error to maintain data integrity and alert users to potential metric labeling problems...
I understand the importance of the error, but it is not being consistently returned.
In the example above, I run a query (max_over_time({__name__=~"metric_.*"}[5m])
) evaluated at time 0, another at time 6, another at 12, another at 18 and a final one at time 24. I don't get an error for any of these single timestamp queries.
But, if I run a single range query from 0 to time 24 with a step of 6, which evaluates the same expression at the same timestamps as the individual queries, I do get an error.
This is also inconsistent with the behaviour of the ceil({__name__=~"metric_.*"})
case, which matches my expectations.
What did you do?
Running a query like
max_over_time({__name__=~"metric_.*"})
produces inconsistent results when run at individual steps rather than a single range query that evaluates at the same steps.I've summarised the issue with a test case in
promqltest
syntax:(I've used
eval range
throughout aseval instant
runs into a legitimate instance ofvector cannot contain metrics with the same labelset
when it runs a range query equivalent of the expression.)What did you expect to see?
All test cases behave as expected, ie. are consistent regardless of the time range queried.
What did you see instead? Under which circumstances?
The
eval range from 0 to 24m step 6m max_over_time({__name__=~"metric_.*"}[5m])
scenario fails withvector cannot contain metrics with the same labelset
.System information
No response
Prometheus version
No response
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response