Open SahilHakimiUofT opened 1 month ago
I'm having the same issue. Using a daemonset with clustering. I don't see any errors that indicate any alloy instance is failing to send metrics but somehow getting out of order results. With clustering enabled I'd expect the target to be handled by a single alloy instance and it never change and that a previous failed write would be tried before sending a new metric. Though like I said I don't even see any failed writes.
What's wrong?
Steps to reproduce
Here is my helm configuration:
System information
No response
Software version
Grafana alloy v1.3.1
Configuration
alloy: alloy: clustering: enabled: true configMap: content: |- logging { level = "info" format = "logfmt" }
Logs
ts=2024-09-04T16:21:21.005501451Z level=error msg="non-recoverable error" component_path=/ component_id=prometheus.remote_write.mimir subcomponent=rw remote_name=1c21e0 url=<redacted> count=500 exemplarCount=0 err="server returned HTTP status 400 Bad Request: failed pushing to ingester: user=anonymous: the sample has been rejected because another sample with a more recent timestamp has already been ingested and out-of-order samples are not allowed (err-mimir-sample-out-of-order). The affected sample has timestamp 2024-09-04T16:21:19.73Z and is from series {__name__=\"response_latency_ms_bucket\", authz_kind=\"default\", authz_name=\"all-unauthenticated\", client_id=\"prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local\", cluster=\"<redacted>\", direction=\"inbound\", instance=\"alloy-cluster\", job=\"prometheus.scrape.pods\", le=\"300\", route_kind=\"default\", route_name=\"default\", srv_kind=\"default\", srv_name=\"all-unauthenticated\", status_code=\"200\", target_addr=\"<redacted>1\", target_ip=\"0.0.0.0\", target_port=\"4191\", tls=\"true\"}"