thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.02k stars 2.09k forks source link

Thanos query don't query all metrics on a receiver #5949

Open thibautmery opened 1 year ago

thibautmery commented 1 year ago

Thanos, Prometheus and Golang version used Thanos version used : v0.29.0.

Object Storage Provider:

S3 ceph

What happened:

My thanos query is plugged on thanos receiver + thanos store. Data query < 15d is query on receiver and data query > 15d is query on thanos store.

All metrics on thanos receiver are upload on ceph bucket that are query by the thanos store.

For some metrics, when I query them, thanos store will give me more metrics than receiver (but the source of all metrics is thanos receiver).

Capture d’écran du 2022-12-08 09-46-15

When you look at the picture don't pay attention to the gap without metrics (it's normal). Here I query on 30d so the first half is on thanos store and the second is on receiver. We see that there is less metrics on receiver than on store.

One more things, when I query prometheus, all the metrics are present and all metrics are remote write to receiver (.*)

What you expected to happen:

Query same source of metrics, so query should get same metrics on thanos store and thanos receiver.

Full logs to relevant components:

{"caller":"multitsdb.go:373","component":"multi-tsdb","level":"debug","msg":"uploading block for tenant","tenant":"node-exporter","ts":"2022-12-08T08:49:42.951410013Z"}
{"caller":"multitsdb.go:373","component":"multi-tsdb","level":"debug","msg":"uploading block for tenant","tenant":"default-tenant","ts":"2022-12-08T08:49:42.951430097Z"}
{"caller":"receive.go:634","component":"uploader","elapsed":"2.677099ms","level":"debug","msg":"upload phase done","ts":"2022-12-08T08:49:42.954085336Z","uploaded":0}
{"caller":"handler.go:485","component":"receive-handler","level":"debug","msg":"only metadata from client; metadata ingestion not supported; skipping","tenant":"node-exporter","ts":"2022-12-08T08:50:00.21855362Z"}
{"caller":"receive.go:626","component":"uploader","level":"debug","msg":"upload phase starting","ts":"2022-12-08T08:50:12.951026083Z"}
{"caller":"multitsdb.go:373","component":"multi-tsdb","level":"debug","msg":"uploading block for tenant","tenant":"default-tenant","ts":"2022-12-08T08:50:12.951132287Z"}
{"caller":"multitsdb.go:373","component":"multi-tsdb","level":"debug","msg":"uploading block for tenant","tenant":"node-exporter","ts":"2022-12-08T08:50:12.95115459Z"}
{"caller":"receive.go:634","component":"uploader","elapsed":"2.817303ms","level":"debug","msg":"upload phase done","ts":"2022-12-08T08:50:12.953948425Z","uploaded":0}

Logs from thanos receiver (no error appear).

Anything else we need to know:

We use to remote write all metrics from another tools (not prometheus) that should be compatible. There is some missing metrics, the problem looks similar.

thibautmery commented 1 year ago

Ok finally found ...

I have 2 receivers and replication factor to 1.

If there is any problem on replication, the data will be on only one thanos receveir and not on both receiver.

Capture d’écran du 2022-12-08 13-25-32

In the picture above, I see a small percentage of error during write forwarding. I guess it's the replication.

Do you have any idea why there is a small amount of error ? (it's the same network, no network issue I think)

yeya24 commented 1 year ago

So what's the exact error on the receiver side? Do you have the error logs

mdraijer commented 5 months ago

We had a similar problem: querier connecting to receiver at service level, retrieving only part of the metrics. Even more puzzling: when you got, say, metrics for the periods from 9:00-10:00 and from 12:00-now, then after a restart of the querier you would get metrics up to 9:00 and from 10:00-12:00 only...

We found that the querier only gets metrics from 1 receiver pod in the ring. Which makes sense, obviously, because when talking to a service, you're in fact talking to one of the pods behind that service. And prometheus is sending a specific series of metrics to one receiver pod (via the service).

We tried the replication_factor, that seemed to make sense, but then got the errors insufficient nodes; have 1, want 2, both in the receiver logs and in prometheus. We had 1 prometheus and 3 receiver pods, replication_factor set to 2. Tried to scale prometheus to 2 but that didn't help: same error.

Also: it would not be logical to have to set the replication_factor to exactly the number of pods in the receiver ring: that doesn't really scale, every pod would handle every sample. And setting it to less than the size of the ring doesn't help because the querier might always connect to a pod that does not have the metric you are looking for.

Finally we got to the solution that we added a headless service and configured the querier such that it queries all pods in the receiver ring, instead of the service. Like this:

        - --endpoint=thanos-receive-0.thanos-receive-headless.[namespace].svc.cluster.local:10901
        - --endpoint=thanos-receive-1.thanos-receive-headless.[namespace].svc.cluster.local:10901
        - --endpoint=thanos-receive-2.thanos-receive-headless.[namespace].svc.cluster.local:10901

Also we set the hashring.json different than the default value of:

    [
      {
        "endpoints": [
            "127.0.0.1:10901"
        ]
      }
    ]

but instead to:

    [
      {
        "endpoints": [
          "thanos-receive-0.thanos-receive-headless.[namespace].svc.cluster.local:10901",
          "thanos-receive-1.thanos-receive-headless.[namespace].svc.cluster.local:10901",
          "thanos-receive-2.thanos-receive-headless.[namespace].svc.cluster.local:10901"
        ]
      }
    ]

Not sure if the hashring change was actually required, in hindsight, but we left it like that.

Now we are finally getting all the metrics all the time. But when we scale the receiver ring, we must change things in 3 places...

Questions: Is this the right way to do it? Why is it necessary to do something special to get all metrics from the receiver ring?