thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.06k stars 2.09k forks source link

Missing metrics when a receiver is shutdown #7845

Open thibautmery opened 8 hours ago

thibautmery commented 8 hours ago

Thanos, Prometheus and Golang version used:

Thnaos version: v.0.35.1

Thanos is deploy with docker-compose.

Architecture: Every prometheus remote write to thanos receiver lb. Those lb forward request to multiple thanos receiver.

We have a replica factor set to 1 and there is always 3 same thanos receiver.

Object Storage Provider:

What happened:

When I shutdown one thanos receiver in a clean way, I loose some metrics when I query them with thanos query: image

What you expected to happen:

I expect that the metrics is replicated in almose one other thanos receiver, so when I query all the thanos recevier, even if one is missing, we should be able to get all the metrics.

The situation is resolved when the receiver that is shut come back.

Anything else we need to know:

The situation is present since few version of thanos.

Thanks you :)

dosubot[bot] commented 8 hours ago

Hello @thibautmery! I'm here to help you with any bugs, questions, or contributions you may have.

I found some relevant information regarding your issue with missing metrics when a Thanos receiver is shut down. When the replication factor is set to 1, Thanos does not replicate the data across multiple receivers. This means that if a receiver is shut down, any data that was only stored on that receiver will be temporarily unavailable until the receiver is back online. This is why you're experiencing missing metrics when one of the receivers is down.

To ensure data availability and redundancy, it's recommended to set the replication factor to at least 2 or 3. This way, each time-series is stored on multiple receivers, allowing you to query the data even if one receiver is unavailable [1][2].

Additionally, I found a similar open issue discussing problems with Thanos receivers when one is down, which might provide further insights: Thanos receiver issue when 1 receiver is down [3].

To continue talking to Dosu, mention @dosu.