[monitoring] high prometheus metrics cardinality

jkleinlercher commented 3 months ago

some prometheus gets an error with remoteWrite to mimir:

ts=2024-08-01T09:51:39.852Z caller=dedupe.go:112 component=remote level=error remote_name=a65cf4 url=https://metrics-monitoring.lab.suxessit.k8s.cloud.uibk.ac.at/api/v1/push msg="non-recoverable error" count=447 exemplarCount=0 err="server returned HTTP status 400 Bad Request: send data to ingesters: failed pushing to ingester sx-mimir-ingester-zone-a-0: user=anonymous: per-user series limit of 150000 exceeded (err-mimir-max-series-per-user). To adjust the related per-tenant limit, configure -ingester.max-global-series-per-user, or contact your service administrator. (sampled 1/10)"

jkleinlercher commented 3 months ago

We definitely have to many metrics series on our prometheus instance on our sx-cnp-oss cluster.

see http://localhost:9090/api/v1/status/tsdb series-count-by-metrics

    "seriesCountByMetricName": [
      {
        "name": "apiserver_request_duration_seconds_bucket",
        "value": 14076
      },
      {
        "name": "etcd_request_duration_seconds_bucket",
        "value": 14064
      },
      {
        "name": "apiserver_request_sli_duration_seconds_bucket",
        "value": 12276
      },
      {
        "name": "apiserver_request_slo_duration_seconds_bucket",
        "value": 12276
      },
      {
        "name": "apiserver_response_sizes_bucket",
        "value": 3120
      },
      {
        "name": "thanos_objstore_bucket_operation_duration_seconds_bucket",
        "value": 2205
      },
      {
        "name": "workqueue_work_duration_seconds_bucket",
        "value": 2002
      },
      {
        "name": "workqueue_queue_duration_seconds_bucket",
        "value": 2002
      },
      {
        "name": "scheduler_plugin_execution_duration_seconds_bucket",
        "value": 1806
      },
      {
        "name": "grpc_server_handled_total",
        "value": 1581
      }
    ],

    "seriesCountByLabelValuePair": [
      {
        "name": "job=kubelet",
        "value": 57365
      },
      {
        "name": "endpoint=https-metrics",
        "value": 57364
      },
      {
        "name": "service=sx-kube-prometheus-stack-kubelet",
        "value": 54620
      },
      {
        "name": "namespace=kube-system",
        "value": 50614
      },
      {
        "name": "metrics_path=/metrics",
        "value": 49018
      },
      {
        "name": "node=k3d-cnp-local-demo-server-0",
        "value": 47514
      },
      {
        "name": "component=apiserver",
        "value": 46640
      },
      {
        "name": "instance=172.25.0.3:10250",
        "value": 45114
      },
      {
        "name": "namespace=default",
        "value": 40495
      },
      {
        "name": "endpoint=https",
        "value": 40373
      }
    ]

jkleinlercher commented 3 months ago

interesting guides:

https://last9.io/blog/how-to-manage-high-cardinality-metrics-in-prometheus/ https://grafana.com/blog/2022/10/20/how-to-manage-high-cardinality-metrics-in-prometheus-and-kubernetes/

kubectl port-forward svc/sx-kube-prometheus-stack-prometheus -n monitoring 9090:9090

then

check the status pages of prometheus: http://localhost:9090/tsdb-status

Queries:

topk(100, count by (__name__, job)({__name__=~".+"}))
topk(100, count by (__name__, instance)({__name__=~".+"}))

jkleinlercher commented 3 months ago

in the querier deployment I set - '-querier.cardinality-analysis-enabled=true' args.

kubectl edit deployment sx-mimir-querier -n mimir

[...]
    spec:
      containers:
      - args:
        - -target=querier
        - -config.expand-env=true
        - -config.file=/etc/mimir/mimir.yaml
        - -querier.cardinality-analysis-enabled=true

according to https://grafana.com/blog/2022/10/20/how-to-manage-high-cardinality-metrics-in-prometheus-and-kubernetes/ but nothing interesting here

jkleinlercher commented 3 months ago

maybe these dashboards help: https://github.com/cerndb/grafana-mimir-cardinality-dashboards/tree/main

jkleinlercher commented 3 months ago

next steps:

I will try to recreate the issue locally and then check if removing some scraping-config according to https://github.com/suxess-it/sx-cnp-oss/issues/353#issuecomment-2263946662 helps
also investigating in high metrics series according to https://github.com/suxess-it/sx-cnp-oss/issues/380#issuecomment-2263279586

jkleinlercher commented 3 months ago

local environment KIND_OBSERVABILITY set up, ext step see https://github.com/suxess-it/sx-cnp-oss/issues/380#issuecomment-2264659275

jkleinlercher commented 3 months ago

local installation of our observability stack has same high amount of series counts

curl http://localhost:9090/api/v1/status/tsdb |jq

{
  "status": "success",
  "data": {
    "headStats": {
      "numSeries": 133146,
      "numLabelPairs": 8184,
      "chunkCount": 266703,
      "minTime": 1722580538141,
      "maxTime": 1722586972676
    },
    "seriesCountByMetricName": [
      {
        "name": "etcd_request_duration_seconds_bucket",
        "value": 14064
      },
      {
        "name": "apiserver_request_duration_seconds_bucket",
        "value": 13968
      },
      {
        "name": "apiserver_request_sli_duration_seconds_bucket",
        "value": 12144
      },
      {
        "name": "apiserver_request_slo_duration_seconds_bucket",
        "value": 12144
      },
      {
        "name": "apiserver_response_sizes_bucket",
        "value": 3104
      },

Also Grafana Mimir-Dashboards show same high counts:

Next Step: check if removing some scraping-config according to https://github.com/suxess-it/sx-cnp-oss/issues/353#issuecomment-2263946662 helps

jkleinlercher commented 3 months ago

also good document: https://medium.com/@dotdc/prometheus-performance-and-cardinality-in-practice-74d5d9cd6230 and https://medium.com/@dotdc/how-to-find-unused-prometheus-metrics-using-mimirtool-a44560173543

jkleinlercher commented 3 months ago

So our next steps will be:

check for the top 10 seriesCountByMetricname (from prometheus status/tsdb:
- is the metric used at all in grafana (Details in https://medium.com/@dotdc/how-to-find-unused-prometheus-metrics-using-mimirtool-a44560173543
- for me cat prometheus-metrics.json | jq .additional_metric_counts was very helpful because you see the unused metrics in count order and also the job which tells you where you can drop this metric
- if not, drop the metric in scrape config of kube-prometheus-stack like described in https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/
- if used but not all labels are useful, droplabel like in https://last9.io/blog/mastering-prometheus-relabeling-a-comprehensive-guide/ or in example https://github.com/prometheus-community/helm-charts/pull/2197/files (needs experience)
write a mimir runbook in https://github.com/suxess-it/sx-cnp-oss/tree/main/backstage-resources/docs for this steps above like in https://grafana.com/docs/mimir/latest/manage/mimir-runbooks/ since our customers and ourselves are likely hitting this issue more often. Add background infos like https://grafana.com/blog/2022/10/20/how-to-manage-high-cardinality-metrics-in-prometheus-and-kubernetes/
maybe get these dashboards working https://github.com/cerndb/grafana-mimir-cardinality-dashboards

Maybe https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-other-methods/helm-operator-migration/reduce_usage/ also helps

the picture in https://victoriametrics.com/blog/cardinality-explorer/ is quiet nice

jkleinlercher commented 3 months ago

Unused metrics with high series count (> 1000 series count):

apiserver_request_duration_seconds_bucket etcd_request_duration_seconds_bucket apiserver_request_slo_duration_seconds_bucket apiserver_response_sizes_bucket workqueue_work_duration_seconds_bucket scheduler_plugin_execution_duration_seconds_bucket apiserver_watch_events_sizes_bucket

jkleinlercher commented 3 months ago

one important thing I learned in my local k3d environment: you need to find out which prometheus job scrapes this metrics, then you know in which part of the kube-prometheus-stack values you need to set the metricRelabeling:

I guessed (and maybe that is also true for non-k3d-clusters) this metric 'etcd_request_duration_seconds_bucket' should be dropped in kubeEtcd section. However, for whatever reason the metrics get scraped by apiserver and kubelet. So I needed to drop them in the sections "kubelet" and "kubeApiServer"

jkleinlercher commented 3 months ago

while working on this I realized it takes a lot of time to drop metrics and on uibklab some of the metrics which mimirtool reports as unused are now used. Also, some of the metrics are not just from kube-prometheus-stack but from other applications like kyverno or argocd, ... So now I think if the approach in https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-other-methods/helm-operator-migration/reduce_usage/ could be better for us. It defines an allow_list before writing it to mimir.

However, for now I think the easiest and fastest solution would be to increase the max_global_series_per_user like others did in e.g. https://github.com/grafana/helm-charts/issues/1320 for now and improve on the metrics and capacity planning for this afterwards.

jkleinlercher commented 3 months ago

with https://github.com/suxess-it/sx-cnp-oss/issues/390 we changed from kube-prometheus-stack to k8s-monitoring helm chart and now we are at about 60k metrics series. So this issue is solved for now

suxess-it / kubriX

[monitoring] high prometheus metrics cardinality #380