thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.12k stars 2.1k forks source link

Global deployment model #163

Closed fabxc closed 4 years ago

fabxc commented 6 years ago

Current test deployments within a single cluster/region work very well. Time to give global deployment options some thought.

Generally speaking, a full setup in every region is probably desirable for reliability and data locality for almost all setups. It also provides trivial means of sharding, which isolates failures and increases scalability.

Currently there seem to be two options:

A) Expand the gossip network globally. All store node, Prometheus servers, and queriers are interconnected.

Pros:

Cons:

B) Keep Thanos cluster regional or even more fine-grained. Additional global federation layer across smaller Thanos clusters.

Query nodes would be made aware of each other through regular service discovery mechanism and act as federation proxies for the Store API.

Pros:

Cons:

fabxc commented 6 years ago

I felt like it would be good to specify the federation approach in a bit more detail to make a decision here: https://docs.google.com/document/d/1-hXTQ3dSFA1yNiUrWCMFqkW84k6PicZK-9tMhTssKN0/edit

bwplotka commented 6 years ago

As discussed there is also: B2) Thanos clusters are still regional or even more fine-grained. On top of local querier there is additional global query layer across smaller all Thanos clusters.

Local query nodes are NOT aware of other clusters or global node. It only exposes additional the Store API. This model is closer to hierarchical Prometheus federation

Pros:

Cons:

peterbourgon commented 6 years ago

I was asked to give some thoughts here, sorry for the delay! To me it seems both easiest and cleanest to have all nodes join the same gossip network (option A), without any sense of hierarchy or federation. Scoping queries to e.g. sites or regions is, to my mind, a query-time operation, and so can be like any other decision made by the query handlers, based on per-node metadata it's already received and cached via gossip.

To me, a federation/hierarchy feels like a sort of performance optimization. Without concrete justification that it's necessary, I'd try to avoid the complexity it would introduce.

This is based on several assumptions:

bwplotka commented 6 years ago

Thank you for you input @peterbourgon! Totally agree, that with these assumptions, option A is perfect. However, at this point we are looking on federation because we are not really sure if below two assumptions you mentioned are always true (for all potential users of the Thanos):

  • That it's possible and reliable to configure e.g. memberlist to work at this scale

  • Operational details like firewalled port ranges etc. aren't problematic enough to optimize for

Especially when we think of the clusters from totally different geographical regions (like EU vs Asia) or some proxy-based cross-cluster communication like istio or kedge where cross-cluster communication requires bit more configuration (and RTT time).

mattbostock commented 6 years ago

There's maybe an option (C); use the Prometheus service discovery library to discover peers instead of using the memberlist library.

Pros:

Cons:

fabxc commented 6 years ago

Thanks @mattbostock. We have considered using Prometheus's SD package. But you are pretty much spot-on with the con's you listed.

Prometheus's SD is only really useful if there's meaningful metadata you need to extract from your service discovery information (e.g. building target labels in Prometheus). In Thanos we do have labels for Prometheus instances – but those are directly extracted from their external_labels config section. Arguably moving this critical information to a loosely connected SD would be asking for trouble for lots of users in the end.

For basic discovery DNS is be mostly fine and in practice provided on top of any more sophisticated discovery mechanism anyway. DNS is basically allowed now through static flag-based configuration of additional data sources in the querier.

Right now Thanos is dead simple to configure, which is largely thanks to a lack of config files. Adding those with relabeling rules would change that immediately.

mattbostock commented 6 years ago

@fabxc: I agree, I think using the Prometheus service discovery library would add unnecessary complexity.

Echoing @peterbourgon's comment above, I think we should avoid an additional 'federation' layer in the interest of keeping things simple.

Store instances can be configured statically, which I think resolves most of this issue? The deployment model I'm thinking of is:

In this scenario, the store instances can be configured 'statically' (e.g. using confd) as part of the command-line options for the query nodes. Additionally, we should support cross-WAN cluster communication. I suggest opening a separate issue to track that.

However, at this point we are looking on federation because we are not really sure if below two assumptions you mentioned are always true (for all potential users of the Thanos)

We can always add this later when/if the use case arises.

swsnider commented 6 years ago

@fabxc The only reason I'd be pro re-using the prometheus SD library is so that we get things like DNS-SD discovery (i.e. SRV records) for free -- it's my understanding that the existing binary can only do A record queries ATM for DNS-based SD?

bwplotka commented 6 years ago

Yea, broadly speaking, setup we ended up is similar to what you said @mattbostock

Each environment:

Monitoring cluster have thanos components connected via gossip. Queriers on this cluster have statically configured scrapers from remote clusters that are connected through the proxy (kEdge), since there is no other connection (vpn) between them. That's why I said there are some cases where gossip is not possible to use. This configuration is fine for now, because we don't really need "automated cluster discovery", so no SD needed.

Additionally, we should support cross-WAN cluster communication. I suggest opening a separate issue to track that.

What do you mean by that? What exactly would you like to have for that? I have cross-WAN by using some external proxy service, so no Thanos change was required.

However, we do want some federated global layer of queries to be on top of all environments and allow global view across envs. This can be done using static --stores query flag + my proxy in my case, and seems to be good enough for now.

bwplotka commented 6 years ago

@swsnider for peers, gossip flow needs an initial list of members. Either raw IP:port or domain:port. In case of latter the DNS lookup for all IP is done: https://github.com/improbable-eng/thanos/blob/master/pkg/cluster/cluster.go#L355

We had SRV lookup there as well, but we found it too complicated for an actual value. This proven to be sufficient for all use cases we had.

fabxc commented 6 years ago

Yea, I think adding DNS-SRV would actually be still reasonable if there's a strong use case for it. It would just be a few lines rather than pulling in the massive Prometheus SD framework and all its deps.
This is only helpful for initial discovery of peers though, much like in Prometheus Alertmanager. Arguably we are already providing a better experience for that than AM did in the past. Generally, one can always startup with a small script that pulls initial peer info before starting the Thanos component

mattbostock commented 6 years ago

Additionally, we should support cross-WAN cluster communication. I suggest opening a separate issue to track that.

What do you mean by that? What exactly would you like to have for that?

@Bplotka: By cross-WAN cluster communication, I mean the ability for cluster peers to communicate and discover each other across a WAN, using appropriate timeouts such as: https://godoc.org/github.com/hashicorp/memberlist#DefaultWANConfig

This would be an alternative to using the static --store flag when your stores are located on the otherwise of a WAN. I don't yet have thoughts on how that would be implemented.

However, we do want some federated global layer of queries to be on top of all environments and allow global view across envs. This can be done using static --stores query flag + my proxy in my case, and seems to be good enough for now.

By 'we' do you mean the Thanos project, or Improbable?

bwplotka commented 6 years ago

@Bplotka: By cross-WAN cluster communication, I mean the ability for cluster peers to communicate and discover each other across a WAN, using appropriate timeouts such as (...)

Ah. Yea, I don't see any problem with changing Thanos to allow setting WAN defaults for gossip if you wish to setup WAN gossip, good point.

By 'we' do you mean the Thanos project, or Improbable?

Sorry for confusion (: I meant only Improbable use case here.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.