Closed jnyi closed 1 week ago
I found a similar open issue that might be related to your problem: Thanos query don't query all metrics on a receiver. In that issue, the solution involved configuring the querier to query all pods in the receiver ring by using a headless service. This was done by specifying each pod's endpoint in the configuration, rather than using the service endpoint. Additionally, they adjusted the hashring.json
configuration to include all receiver endpoints. This change ensured that all metrics were consistently queried, although they noted that scaling the receiver ring required changes in multiple places.
For your specific issue, it seems like the new tenant is not being dynamically registered for querying without a restart. The Thanos receiver component should dynamically detect and register new tenants without requiring a restart by provisioning an independent TSDB for each new tenant detected via the tenant HTTP header [1]. However, if this is not happening, it might be a regression or a configuration issue in the latest main branch.
You might want to check if the configuration flags related to tenant registration and queryability are correctly set, such as --receive.tenant-header
and any relevant querier flags [2][1]. If the issue persists, it could be beneficial to report this as a potential regression in the main branch, providing the details and steps to reproduce as you've outlined.
To continue talking to Dosu, mention @dosu.
ts=2024-11-13T06:33:34.121916792Z caller=multitsdb.go:697 level=info name=pantheon-db component=receive component=multi-tsdb tenant=random-tenant123 msg="opening TSDB"
ts=2024-11-13T06:33:34.128508813Z caller=multitsdb.go:743 level=info name=pantheon-db component=receive component=multi-tsdb tenant=random-tenant123 msg="TSDB is now ready"
ts=2024-11-13T06:33:50.252309547Z caller=shipper.go:259 level=warn name=pantheon-db component=receive component=multi-tsdb tenant=random-tenant123 msg="reading meta file failed, will override it" err="failed to read /var/thanos/data/random-tenant123/thanos.shipper.json: open /var/thanos/data/random-tenant123/thanos.shipper.json: no such file or directory"
Tested in latest main, this behavior didn't happen:
We are testing the latest thanos main branch and found a regression that didn't exist in v0.36 prior
For a given running thanos receiver cluster, we start a new tenant called "eng-host-networking" and we can see tsdb head metric started pop up but all metrics to that tenant are not queryable unless restart the receiver cluster
How to repro:
prometheus_tsdb_head_series{tenant="<new tenant>"}
Thanos, Prometheus and Golang version used: Thanos: v0.37.0-dev Golang: v1.23
Object Storage Provider:
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Full logs to relevant components:
Anything else we need to know: