thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.1k stars 2.1k forks source link

Have a way to disable endpointSet API auto discovery #4683

Open yeya24 opened 3 years ago

yeya24 commented 3 years ago

Is your proposal related to a problem?

The new EndpointSet change is added to the latest release and now Thanos querier is able to automatically discover the APIs supported by stores.

However, this doesn't work well in edge cases. As I mentioned in https://github.com/thanos-io/thanos/pull/4282#issuecomment-872449754, we have multiple Thanos sidecars deployed in different data centers across the world and only one central Thanos query in the center. Using the auto-discovery makes the query fanout a lot, which is not good and unnecessary in our case.

I'd like to have a way to disable the endpointSet discovery. The old behavior is good enough for us (specifying exemplars/rules API endpoints if needed).

Describe the solution you'd like

(Describe your proposed solution here.)

bwplotka commented 3 years ago

Thanks for reporting!

Currently, there is no and I was pointing this during the proposal phase. We assumed we can add some filtering later on, plus anyone can put their own proxy to block certain gRPC / http2 paths.

In your case, looks like you simply don't want to use metadata across the whole fleet. So you can just not use metadata HTTP at all and you are good to go. The problem arises when you want to somehow have Querier use metadata for small group of APIs, then store for rest. Is this what you try to accomplish?

cc @hitanshu-mehta @GiedriusS

If this use case is strong, we have couple of options, but I believe subgrouping of APIs is very niche use case, so let's wait for @yeya24 response.

yeya24 commented 3 years ago

Yes, I want to use sub grouping of APIs. I only want to enable metadata and exemplars API at the sidecar which is at the same cluster of our queriers. It is unnecessary to do federated metadata APIs across lots of clusters as metadata is usually the same for all clusters. For exemplars, we don't enable global level trace collection because it is expensive so it doesn't make sense to do federated exemplars API as well.

We cannot control not to use some APIs because Grafana will call exemplars and metadata APIs automatically in the explore page.

bwplotka commented 3 years ago

yea, ok, then it's must have.

In this case we have couple of options:

  1. Have --endpoint for auto discovery and leave --store and others for manual specifications
  2. Have --endpoint for auto discovery and ask user to use --endpoint.config for more complex use cases and add filtering there (endpoint.config is what @Namanl2001 is working on with TLS work).

I think I like 2 the most, less flags and focusing on easy cases first.

onprem commented 3 years ago

+1 for option 2. Another option is that we can allow disabling certain APIs on component level, so for example, Ben can disable Metadata API for sidecars running outside the cluster. In action, this can be implemented by having a flag like --disable-metadata-api on Sidecar.

yeya24 commented 3 years ago

+1 for option 2. Another option is that we can allow disabling certain APIs on component level, so for example, Ben can disable Metadata API for sidecars running outside the cluster. In action, this can be implemented by having a flag like --disable-metadata-api on Sidecar.

Love this idea. This should be easier for us.

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Harshitavkb26 commented 2 years ago

Hello there, I'm interested in the problem. Could you assign it to me?

bwplotka commented 2 years ago

Just go for it, we don't manage assignment via any specific tool (:

In order to fix this sounds like someone need to finish this PR up: https://github.com/thanos-io/thanos/pull/4785

Feel free to create another PR from that branch if you want to help there (:

Harshitavkb26 commented 2 years ago

sure (:

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.