Open aitchjoe opened 6 months ago
I have other use case, but i think my solution can solve your problem too. Just after Aggregator little bit modify time.
"timestamp": parse_timestamp!(from_unix_timestamp!(to_unix_timestamp!(.original_timestamp, unit: "milliseconds") + random_int(1000, 9999), unit: "milliseconds"), "%+")
I have other use case, but i think my solution can solve your problem too. Just after Aggregator little bit modify time.
For Mimir HA deduplication, we need to drop this metric even we can change timestamp to accept.
A note for the community
Use Cases
Our metric pipeline is:
When Mimir enable high-availability deduplication, there are many
err-mimir-sample-duplicate-timestamp
erros in mimir-distributor log:But if Prometheus remote write to Mimir directly, there is no error. After some debug, we think it is caused by batch config of Prometheus remote write sink. Becasue Vector aggregator receive data from two Prometheus instances, but when it send to Mimir, one batch submit maybe mix two instances data, and in Mimir distributor.go:
That mean Mimir only check the first data (
req.Timeseries[0].Labels
) in batch to accept or drop all data, so after Mimir remove replica label,err-mimir-sample-duplicate-timestamp
happened. When we tried to change Prometheus remote write sinkbatch.max_events
from default 1000 to 1, the error gone which confirmed our guess.Attempted Solutions
batch.max_events
to 1, but it is bad for performance.Proposal
Add Prometheus HA config in Prometheus remote write sink, dont batch different cluster or replica data.
References
No response
Version
vector 0.35.0