Open tiithansen opened 1 week ago
Answering to the specific request:
Mimir supports metric_relabel_configs
, that the distributor applies after the HA tracker. From the history, it was originally implemented for https://github.com/cortexproject/cortex/issues/1507, but it's been a niche experimental feature since then. There are some details on how to use it in https://github.com/grafana/mimir/issues/1809
Note that the config flag comes with a warning:
in most situations, it is more effective to use metrics relabeling directly in the Prometheus server, e.g.
remote_write.write_relabel_configs
.
We have three labels in total.
cluster
which is used in queries,__prometheus_type__
which indicates the tier prometheus belongs to and__replica__
which indicates replica number in the tier.
I cannot say I fully understand this setup. Do different "prometheus_type" prometheuses scrap same set of metrics or not? If yes, then wouldn't removing the __prometheus_type__
label break it, no matter if this happens before or after the HA tracker? It seems that distributor would end up injecting a set of duplicate metrics within one cluster
label (providing __prometheus_type__
and __replica__
were removed as per your HA tracking rule).
One thing is forgot to mention is that cluster label in HA tracker is configured to __prometheus_type__
but also add regular cluster
label when we remote write to Mimir.
Prometheuses with different __prometheus_type__
scrape different metrics from different services. For example __prometheus_type__="system"
scrapes only metrics from kubernetes components, node exporters ... and __prometheus_type__="business"
scrapes metrics only from applications developed by our developers.
This way if some business app explodes with cardinality we will still receive all system metrics and metrics from others shards of business prometheuses.
Describe the feature request
We have a tiered prometheus setup where each tier has its own responsibility. Because of this we track HA labels differently. We have three labels in total.
cluster
which is used in queries,__prometheus_type__
which indicates the tier prometheus belongs to and__replica__
which indicates replica number in the tier. Because Mimir only drops__replica__
label we are left with__prometheus_type__
replica but we would like to get rid of it.Reason for such setup is that if one tier becomes unstable others will be unaffected.
For example:
Describe the solution you'd like
Allow specifying in config which additional labels distributor should drop from received timeseries.
Configured labels could be easily dropped here
Alternatives
I have tried
drop_labels
but it seems to run before ha tracker and it breaks ingestion.