thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.01k stars 2.09k forks source link

Docs Bug -- Out-of-order Samples Error troubleshooting has an unclear remedy #4651

Closed robertjsullivan closed 2 years ago

robertjsullivan commented 3 years ago

Thanos, Prometheus and Golang version used: N/A

Object Storage Provider:

What happened: On this docs page https://thanos.io/tip/operating/troubleshooting.md/#out-of-order-samples-error, the resolution for "Remote Prometheus is running in high availability mode" references setting replica_external_label_name. I searched the web and couldn't find out where that could be set in Thanos or Prometheus. Is setting this value required to send HA pair Prometheus data to the same Thanos Receiver? Or would unique external_labels in Prometheus be enough? I think it would be good if the install docs here also mentioned how to handle HA Prometheus pairs: https://thanos.io/tip/components/receive.md/ , if any considerations are necessary.

For reference we are running into the "Error on ingesting out-of-order samples" error when running two Prometheus servers scraping the same data, but with unique external_labels. Scaling down to a single Prometheus server solves the problem. I'm hoping to verify we have the correct configuration for this scenario before troubleshooting further.

What you expected to happen: Able to find docs on how to set up and troubleshoot the Thanos Receiver with HA Prometheus pairs.

How to reproduce it (as minimally and precisely as possible): N/A

Full logs to relevant components: N/A

Anything else we need to know: Hopefully this is the right place to raise a docs bug. If not, please let me know and I'll post elsewhere. Thanks!

yeya24 commented 3 years ago

replica_external_label_name represents the external label configured in your Prometheus HA pairs that is used for identifying Prometheus replicas.

Is setting this value required to send HA pair Prometheus data to the same Thanos Receiver? Or would unique external_labels in Prometheus be enough?

Anyway, different Prometheus replicas need unique external labels.

robertjsullivan commented 3 years ago

Thanks for the clarification. So if I set my Prometheis up with:

global: external_labels: some-label: some-unique-value

Do I need to tell the Thanos receiver what label identifies the replica as unique (i.e. some-label)? Or is the fact that the time series that are coming out of those Prometheus pairs being unique is enough? Does the receiver attempt to do any sort of deduplication?

yeya24 commented 3 years ago

Thanks for the clarification. So if I set my Prometheis up with:

global: external_labels: some-label: some-unique-value

Do I need to tell the Thanos receiver what label identifies the replica as unique (i.e. some-label)? Or is the fact that the time series that are coming out of those Prometheus pairs being unique is enough? Does the receiver attempt to do any sort of deduplication?

Do I need to tell the Thanos receiver what label identifies the replica as unique (i.e. some-label)? Or is the fact that the time series that are coming out of those Prometheus pairs being unique is enough?

No config is needed in the receiver. The series from Prometheus with unique external labels is sufficient.

Does the receiver attempt to do any sort of deduplication?

Deduplication at the receiver side is not supported right now. But it is doable if you are running compactor in the penalty deduplication mode.

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 2 years ago

Closing for now as promised, let us know if you need this to be reopened! 🤗