prometheus-operator / prometheus-operator

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes
https://prometheus-operator.dev
Apache License 2.0
9k stars 3.7k forks source link

Label-based PodMonitor and ServiceMonitor namespace selectors #6812

Open ringerc opened 1 month ago

ringerc commented 1 month ago

Component(s)

ServiceMonitor, PodMonitor

What is missing? Please describe.

A way to configure prometheus and the operator to discover pods and services to scrape only in namespaces matching a label selector. In a way that means Prometheus won't try to enumerate workloads in non-matching namespaces where it may lack RBAC to list them.


PodMonitor and ServiceMonitor's spec.namespaceSelector is a list of namespaces rather than an actual selector. It cannot match the namespaces to search by label.

Where Prometheus instance deployed into an environment with strong RBAC (like a default Openshift cluster) it will not have permission to enumerate all pods in all namespaces. It can be given per-namespace permissions by deploying a suitable Role and RoleBinding into the namespace(s). But currently there is no way to tell it to only look for resources in namespaces satisfying a label. If the namespace(s) containing the workloads targeted by the PodMonitor or ServiceMonitor are not known in advance, this means there's no way to deploy a single PodMonitor or ServiceMonitor to scrape them.

The Prometheus object's spec.podMonitorNamespaceSelector can be used to tell the operator to look for PodMonitors and ServiceMonitors in namespaces matching a label selector. But in this case the PodMonitor or ServiceMonitor(s) must be duplicated into each namespace.

Instead, it would be helpful to be able to set a label selector for the namespaces searched for workloads by a PodMonitor or ServiceMonitor, so the monitor can discover workloads only in namespaces it's instructed to search. Ideally the prometheus-operator could also assist with Prometheus RBAC, by injecting the Role and RoleBinding to allow Prometheus to enumerate scrape-able workloads in suitably annotated namespaces.

Ideally something like this (nonexistent) config:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: prom-operator-6812

and in monitoring

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: pg-cluster
  namespace: monitoring
spec:
  podMetricsEndpoints:
    # ...
  # this `namespaceLabelSelector` does not exist in the PodMonitor CR
  namespaceLabelSelector:              # <---------- this is the feature request
    matchLabels:
      openshift.io/cluster-monitoring: "true"

a role and rolebinding is also required


apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: prometheus-k8s
  namespace: prom-operator-6812
rules:
 # same as the Role/prometheus-k8s in the default namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prometheus-k8s
  namespace: prom-operator-6812
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus-k8s
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring

Describe alternatives you've considered.

Use per-namespace PodMonitors, but:

To do it:

Copy the PodMonitors and ServiceMonitors to each namespace along with the Role and RoleBinding needed to permit Prometheus to discover target workloads. Add a label to the namespaces to indicate that they're enabled for prometheus monitoring. Leave the pod and service monitors' .namespaceSelector blank so only the current namespace is checked.

Use the Prometheus object's spec.podMonitorNamespaceSelector and spec.serviceMonitorNamespaceSelector to match namespaces that have the needed role and rolebinding.

e.g.:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
  namespace: monitoring
spec:
  # ...
  podMonitorNamespaceSelector:
    matchLabels:
      openshift.io/cluster-monitoring: "true"
  # ...
---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: prom-operator-6812
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: pg-cluster
  namespace: prom-operator-6812
spec:
  podMetricsEndpoints:
    # ...

This works, but requires copying a lot of resources around, and it creates new targets per namespace.

Environment Information.

Environment

Kubernetes Version: v1.29.2 (kind), v1.28.10 (openshift) Prometheus-Operator Version: v0.75.1

simonpasquier commented 1 month ago

I feel that it's a request which would be better addressed by the ScrapeConfig CRD. Also updating the RBAC permissions of the Prometheus service account isn't something under the responsibillity of the operator right now and I wouldn't change this.