thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.12k stars 2.1k forks source link

thanos_receive_write_timeseries histogram is missed in Thanos Receive v0.35.1 #7573

Open pvlltvk opened 3 months ago

pvlltvk commented 3 months ago

Thanos, Prometheus and Golang version used:

$ thanos --version thanos, version 0.35.1 (branch: HEAD, revision: 086a698b2195adb6f3463ebbd032e780f39d2050) build user: root@be0f036fd8fa build date: 20240528-13:54:20 go version: go1.21.10 platform: linux/amd64 tags: netgo

What happened:

After upgrade from 0.34.1 to 0.35.1 I noticed that thanos_receive_write_timeseries histrogram is gone.

What you expected to happen:

thanos_receive_write_timeseries histogram presents

How to reproduce it (as minimally and precisely as possible):

Setup Thanos Receive in router/ingestor mode and start to collect metrics from router pods

Full logs to relevant components:

I don't think that it's relevant.

Anything else we need to know:

No

yeya24 commented 3 months ago

https://github.com/thanos-io/thanos/blob/main/pkg/receive/handler.go#L192 The histogram metric should still be there. Can you please provide more information like screenshot of missing this metric?

Setup Thanos Receive in router/ingestor mode and start to collect metrics from router pods

Maybe it is relevant since the metric was missing in Router pods? I personally don't use it so I am not sure if Router will still use this metric. Maybe @saswatamcode @douglascamata can help answer this?

cincinnat commented 3 months ago

I have the same issue when Thanos Receive is in RouterOnly mode. It seems this caused by https://github.com/thanos-io/thanos/commit/66841fbb1e758bdcf06bf6a2771f5a09ba951c55 (or so git bisect says), since, if I am not missing something, in RouteOnly mode this condition is never true: https://github.com/thanos-io/thanos/blob/66841fbb1e758bdcf06bf6a2771f5a09ba951c55/pkg/receive/handler.go#L846

anarcher commented 3 months ago

In version 0.35, the metrics thanos_receive_write_timeseries and thanos_receive_write_samples were being generated. However, after upgrading to version 0.36.1, these metrics are no longer appearing. I'm operating the router and ingestor separately, and there are no significant issues with metric collection. Which areas should I investigate to easily identify the cause of this problem?

yeya24 commented 2 months ago

Umm, unfortunately I don't really use Thanos Receiver so I cannot comment. Maybe @saswatamcode @fpetkovski can chime in if you observe the same issue

RainbowHerbicides commented 2 months ago

We also use Receive + Receive Distributor and after update to 0.36.1 next metrics had been removed:

'{__name__="thanos_receive_write_samples_bucket"}'
'{__name__="thanos_receive_write_samples_count"}'
'{__name__="thanos_receive_write_samples_sum"}'
'{__name__="thanos_receive_write_timeseries_bucket"}'
'{__name__="thanos_receive_write_timeseries_count"}'
'{__name__="thanos_receive_write_timeseries_sum"}'

Since our update path was from 0.34.X to 0.36.1 - we cannot confirm or deny that metric were present in between (however, other reporters confirm that it was present in 0.35)