sustainable-computing-io / kepler-doc

Kepler uses eBPF to probe energy related system stats and exports as Prometheus metrics
https://sustainable-computing.io/
Apache License 2.0
13 stars 39 forks source link

Installation on Prometheus Operator on Openshift can break Cluster Monitoring #58

Open sthaha opened 1 year ago

sthaha commented 1 year ago

The Deploy section of the Kepler Doc recommends installing Prometheus Operator. This would install 2 instances of Prometheus Operators that if not properly configured can render the cluster's in-platform monitoring unusable as the new Prometheus Operator can reconcile the prometheus-k8s in openshift-monitoring namespace.

marceloamaral commented 1 year ago

Did you install Prometheus Operator in the openshift-monitoring or monitoring namespace?

sthaha commented 1 year ago

Did you install Prometheus Operator in the openshift-monitoring or monitoring namespace?

Neither, but I were to do it, I wouldn't touch openshift-monitoring ns

I did not install Prometheus Operator ( PO ) since I was pretty sure (based on previous experience) looking at the deployment yaml that PO in kube-prometheus does not limit the resources it watches see: https://github.com/prometheus-operator/kube-prometheus/blob/dc0ad5e2162110c31c0c08d097f688145ce8e229/manifests/prometheusOperator-deployment.yaml#L29

      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.65.2
        image: quay.io/prometheus-operator/prometheus-operator:v0.65.2

Unlike the incluster Prometheus Operator that does https://github.com/openshift/cluster-monitoring-operator/blob/076da3ba2d27edb00765cff6a51b0b7a2785ce03/assets/prometheus-operator/deployment.yaml#LL34C8-L38C66

      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.65.1
        - --prometheus-instance-namespaces=openshift-monitoring
        - --thanos-ruler-instance-namespaces=openshift-monitoring
        - --alertmanager-instance-namespaces=openshift-monitoring

This would lead to the PO reconciling all Prometheus instances including the incluster one which renders the in cluster monitoring Prometheus unstable as both operators start to update it.

I think it may be easier to use user-workload-monitoring on OpenShift - https://docs.openshift.com/container-platform/4.13/monitoring/configuring-the-monitoring-stack.html