Closed harry671003 closed 2 days ago
Requesting reviews. @fpetkovski @MichaHoffmann could you please take a look?
Just one question from my side, what is the difference between IncrementSamplesAtStep(samples int, step int) and IncrementSamplesAtTimestamp(samples int, t int64)? Do we still need both methods?
@fpetkovski - Thanks for asking this question. It prompted me to look deeper and uncover a bug.
IncrementSamplesAtTimestamp
should be used when we don't know the correct step index. This is always the case in the new engine because of stepsBatch.
See: https://github.com/prometheus/prometheus/blob/71c90c71d41948eea3f33dfd3ee0a5cc581a36f3/util/stats/query_stats.go#L290
We should always use IncrementSamplesAtTimestamp()
in the new engine. In the new engine, we only know the timestamp and not the step index. This is because of the batching using stepsBatch.
This is what would happen if IncrementSamplesAtStep()
is used in the VectorSelector. In the below example range query, there are a total of 61 steps which are executed in 7 batches of 10 steps.
{
name: "vector selector",
load: `load 30s
http_requests_total{pod="nginx-1"} 1+1x1000
http_requests_total{pod="nginx-2"} 1+2x1000`,
query: `http_requests_total{pod="nginx-1"}`,
start: time.Unix(0, 0),
end: time.Unix(1800, 0),
step: time.Second * 30,
},
// Old engine
exp: []int64{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
// New engine
got: []int64{7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
Since at the start of each batch, the currStep is reset to 0, the samples will always be counted towards the first 10 steps (assuming stepsBatch=10).
When IncrementSamplesAtTimestamp()
is used instead, it will correctly increment the sample for the timestamp of the step irrespective of the batch.
I will remove IncrementSamplesAtStep()
from the code base and always use IncrementSamplesAtTimestamp()
. I will also extend the tests to cover multiple batches.
The fuzz failure doesn't seem related to this change.
This is an attempt at implementing the samples per step stats
Changes
Note