Prometheus Operator not filtering kubelet metrics to given namespace list

johnswarbrick-napier commented 4 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

What happened?

I am deploying Prometheus Operator into a shared Kubernetes cluster with a large number of namespaces.

However I only want to discover resources and receive alerts for a small number of explicitly listed namespaces.

I have configured the Prometheus Operator to only discover resources in a given namespace list:

  prometheusOperator:
    namespaces:
      releaseNamespace: true
      additional:
      - namespace1
      - namespace2
      - argo
      - ingress-nginx

This works fine for limiting the discovery of ServiceMonitors.

However I am received Prometheus defaultRules alerts for other namespaces, not included on the namespace list, for example CPUThrottlingHigh which uses this default Prometheus rule:

sum by (cluster, container, pod, namespace) (increase(container_cpu_cfs_throttled_periods_total{container!=""}[5m])) / sum by (cluster, container, pod, namespace) (increase(container_cpu_cfs_periods_total[5m])) > (25 / 100)

The defaultRules alerts being fired all seem to be related to metrics obtained by the Prometheus Operator from Kubelet.

I think the problem is that Kubelet is deployed and managed by Prometheus Operator, but the metrics received by Kubelet are not filtered to the explicit list of namespaces that I provided to the Prometheus Operator.

How can I restrict the Kubelet metrics so they are only scraped or stored from the specific namespaces that I listed into the Prometheus Operator configuration?

Prometheus Operator Version

Name:                   monitoring-kube-prometheus-operator
Namespace:              monitoring
CreationTimestamp:      Fri, 01 Mar 2024 15:21:36 +0000
Labels:                 app=kube-prometheus-stack-operator
                        app.kubernetes.io/component=prometheus-operator
                        app.kubernetes.io/instance=monitoring
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=kube-prometheus-stack-prometheus-operator
                        app.kubernetes.io/part-of=kube-prometheus-stack
                        app.kubernetes.io/version=58.2.2
                        argocd.argoproj.io/instance=monitoring-devx-prod
                        chart=kube-prometheus-stack-58.2.2
                        heritage=Helm
                        release=monitoring
Annotations:            deployment.kubernetes.io/revision: 4
Selector:               app=kube-prometheus-stack-operator,release=monitoring
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=kube-prometheus-stack-operator
                    app.kubernetes.io/component=prometheus-operator
                    app.kubernetes.io/instance=monitoring
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=kube-prometheus-stack-prometheus-operator
                    app.kubernetes.io/part-of=kube-prometheus-stack
                    app.kubernetes.io/version=58.2.2
                    chart=kube-prometheus-stack-58.2.2
                    heritage=Helm
                    release=monitoring
  Service Account:  monitoring-kube-prometheus-operator
  Containers:
   kube-prometheus-stack:
    Image:      napier.azurecr.io/prometheus-operator:v0.73.2
    Port:       10250/TCP
    Host Port:  0/TCP
    Args:
      --kubelet-service=kube-system/monitoring-kube-prometheus-kubelet
      --log-level=warn
      --localhost=127.0.0.1
      --prometheus-config-reloader=napier.azurecr.io/prometheus-config-reloader:v0.73.2
      --config-reloader-cpu-request=0
      --config-reloader-cpu-limit=0
      --config-reloader-memory-request=0
      --config-reloader-memory-limit=0
      --thanos-default-base-image=napier.azurecr.io/thanos/thanos:v0.34.1
      --secret-field-selector=type!=kubernetes.io/dockercfg,type!=kubernetes.io/service-account-token,type!=helm.sh/release.v1
      --web.enable-tls=true
      --web.cert-file=/cert/tls.crt
      --web.key-file=/cert/tls.key
      --web.listen-address=:10250
      --web.tls-min-version=VersionTLS13
    Environment:
      GOGC:  30
    Mounts:
      /cert from tls-secret (ro)
  Volumes:
   tls-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  monitoring-kube-prometheus-admission
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  monitoring-kube-prometheus-operator-5674dd7487 (0/0 replicas created), monitoring-kube-prometheus-operator-688b9c658d (0/0 replicas created)
NewReplicaSet:   monitoring-kube-prometheus-operator-578c9c4d56 (1/1 replicas created)

Kubernetes Version

clientVersion:
  buildDate: "2023-05-17T14:20:07Z"
  compiler: gc
  gitCommit: 7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647
  gitTreeState: clean
  gitVersion: v1.27.2
  goVersion: go1.20.4
  major: "1"
  minor: "27"
  platform: linux/amd64
kustomizeVersion: v5.0.1
serverVersion:
  buildDate: "2024-04-17T00:10:39Z"
  compiler: gc
  gitCommit: 587f5fe8a69b0d15b578eaf478f009247d1c5d47
  gitTreeState: clean
  gitVersion: v1.28.9
  goVersion: go1.21.9
  major: "1"
  minor: "28"
  platform: linux/amd64

Kubernetes Cluster Type

AKS

How did you deploy Prometheus-Operator?

helm chart:prometheus-community/kube-prometheus-stack

Manifests

No response

prometheus-operator log output

Nothing relevant to this issue

Anything else?

No response

simonpasquier commented 4 months ago

You'd need to use metricRelablings in the kubelet service monitor to filter timeseries by their namespace label. See https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/running-exporters.md#metric-relabeling for an example.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

prometheus-operator / prometheus-operator