thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.05k stars 2.09k forks source link

Enable exploration of Thanos Store Gateway by (external) label pairs #5399

Open pedro-stanaka opened 2 years ago

pedro-stanaka commented 2 years ago

Is your proposal related to a problem?

I help manage a deployment of Thanos with most used components (Store GW + Compactor + Querier + Query FE). We are handling blocks coming from several clusters (hundreds of them) into a single "observability" cluster. When trying to debug memory usage for a particular dashboard or query it is quite hard to figure out which Store Gateway holds blocks for a specific label set (e.g. `cluster_name="app-cluster-foo") because you have to go into each store gw and check the blocks for it in the UI. To provide more context into this, we are "hashmodding" the blocks from different clusters, so we can evenly distribute load, so this is one of the reasons we can't predict where a block will land.

Describe the solution you'd like

I would like to have an option to deploy a single binary which interacts with the Thanos Stores and is able to create an overview of how blocks are distributed and some summary information like which timespan is also represented in each Store-gw.

Describe alternatives you've considered

After trying to find a tool online and in the Thanos project repos, I ended up writing a tool of my own (just a draft, but already in a working state), because there was nothing available that would satisfy these requirements.

Additional context

The tool I mentioned is a simple Golang appconnects to each store gw via HTTP API and hit /api/v1/blocks?loaded=true and generates summaries in memory of the blocks, then we have a SPA written in react to show this data.

yeya24 commented 2 years ago

I love this idea in general. The question now is the interface to expose this. Whether it is a UI, API, or a cli tool. From my perspective, a cli tool to fetch /api/v1/blocks?loaded=true from multiple store gateways are good enough.

pedro-stanaka commented 2 years ago

I could write an CLI from that data I already have on my "server" that communicates with all stores, I will try to come up with something. The only problem here is that you always have to kubectl exec into some thanos pod to use the cli tool, whereas with a web UI it could be deployed as part of whole deployment.

fpetkovski commented 2 years ago

I think extending the existing web tooling would be great. For example, adding a new Stores web command here: https://thanos.io/tip/components/tools.md/

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

fpetkovski commented 2 years ago

Still relevant

stale[bot] commented 1 year ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

pedro-stanaka commented 1 year ago

/remind me 28/11/2022

I will work on this later this month.

reminders[bot] commented 1 year ago

@pedro-stanaka set a reminder for Nov 28th 2022