Open SeanKilleen opened 1 week ago
If using a StatefulSet, should I specify one PVC so that both replicas are using the PVC? (or would that break things?) Absolutely not. This shouldn't even be question to be honest because this is basic kubernetes and has nothing to do with prometheus. Statefulsets do not share volumes, that's why they are statefulsets.
Does this naturally create a cluster where those Prometheus nodes are communicating? Or are they running independently of one another? They are running independently from each other but have the same config. This is very important to understand.
Can I still set e.g. my Grafana dashboard to pull from the service in front of these nodes (or from one node specifically)? Or do I need to modify this approach to consider an additional data source? You can use Grafana to pull from the service in front of these nodes, the service will load balance the requests to each individual instance of prometheus but they should contain the same time series during normal operation. However the data during upgrades or incidents may not be accurate, because some instances that have been affected by an outage or that have been upgrading will not have scraped the data during that time period, while other intances will have the data. This can lead to misleading graphs, and wildly changing graphs on each refresh/query execution.
To combat this you should use an aggregator/deduplicator like thanos querier.
Using a blank issue because it doesn't rise to the level of a bug but is something that a PR could fix and that I'd be happy to contribute back once I understand the approach.
Challenge
It is unclear to an end user (or at least a less intimately familiar end user) what the implications are around running this chart with 2+ replicas as a StatefulSet.
The replicas both spin up their own PVCs of equal size unless a pre-created PVC is specified.
Questions naturally arising from this:
Suggestion
Once these implications are understood here, I can update the values.yaml etc. to reflect the shared understanding via a PR and close this out.
Impact
I think this will make it easier for folks newer to Prometheus who want to build a little resiliency and are looking to understand the default behavior.