keptn-contrib / prometheus-service

Keptn service for utilizing Prometheus monitoring and alerting in keptn
Apache License 2.0
10 stars 13 forks source link

Support prometheus deployed via prometheus-operator #193

Open jaylevin opened 2 years ago

jaylevin commented 2 years ago

This issue is to address the incompatibility between the Keptn prometheus-service and prometheus deployed via the Prometheus Operator

Currently, the keptn prometheus-service depends on reading/writing to both the prometheus and alert-manager ConfigMap that are deployed as part of the Prometheus Community helm chart. However, when Prometheus is deployed on K8s via the prometheus-operator, these ConfigMaps do not exist.

Instead, (from my very limited understanding) the prometheus-operator watches for ServiceMonitor CRs in order to configure new scrape jobs. The prometheus-service keptn integration should ideally be able to handle the deployment of these CRs in order to create new scrape jobs for each service/project/stage that is configured to be monitored.

christian-kreuzberger-dtx commented 2 years ago

Right now our recommendation is to install prometheus via the official helm chart:

kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus --namespace monitoring

here the configmap alraedy exists and just needs to be overwritten. Though we're having some issues with that, see #240

Do you think the operator is better suited for this? But would that mean we can no longer support "classical" prometheus on Kubernetes installations?

christian-kreuzberger-dtx commented 2 years ago

The following slack discussion https://keptn.slack.com/archives/CNRCGFU3U/p1643028340100100 reveals that we are not compatible with the Prometheus operator.

It seems that the names of services and pods/deployments have changed:

$ kubectl -n monitoring get all
NAME                                                          READY   STATUS              RESTARTS   AGE
pod/alertmanager-prometheus-operator-alertmanager-0           0/2     ContainerCreating   0          10d
pod/metrics-server-85496d4f7c-djzjj                           1/1     Running             0          10d
pod/prometheus-operator-grafana-588d549949-x2tg8              2/2     Running             2          59d
pod/prometheus-operator-grafana-test                          0/1     Completed           0          59d
pod/prometheus-operator-kube-state-metrics-64d56fc9df-wp8wc   1/1     Running             0          10d
pod/prometheus-operator-operator-7fb8c9f85c-2nvgh             2/2     Running             0          10d
pod/prometheus-operator-prometheus-node-exporter-4flc8        1/1     Running             1          111d
pod/prometheus-operator-prometheus-node-exporter-5lw7d        1/1     Running             4          111d
pod/prometheus-operator-prometheus-node-exporter-72w6s        1/1     Running             1          111d
pod/prometheus-operator-prometheus-node-exporter-7b6p2        1/1     Running             2          111d
pod/prometheus-operator-prometheus-node-exporter-9rx2g        1/1     Running             3          111d
pod/prometheus-operator-prometheus-node-exporter-q66cl        1/1     Running             2          89d
pod/prometheus-operator-prometheus-node-exporter-rmd9j        1/1     Running             6          89d
pod/prometheus-operator-prometheus-node-exporter-s4f5d        1/1     Running             1          111d
pod/prometheus-operator-prometheus-node-exporter-v6vbz        1/1     Running             1          111d
pod/prometheus-operator-prometheus-node-exporter-vbtlh        1/1     Running             1          111d
pod/prometheus-operator-prometheus-node-exporter-x6vhm        1/1     Running             2          111d
pod/prometheus-operator-prometheus-node-exporter-xccx9        1/1     Running             1          111d
pod/prometheus-prometheus-operator-prometheus-0               3/3     Running             3          58d
pod/telegraf-daemonset-5f9cv                                  2/2     Running             2          111d
pod/telegraf-daemonset-7c2xn                                  2/2     Running             2          111d
pod/telegraf-daemonset-7gxcb                                  2/2     Running             2          111d
pod/telegraf-daemonset-7nfwl                                  2/2     Running             2          111d
pod/telegraf-daemonset-9225k                                  2/2     Running             4          111d
pod/telegraf-daemonset-cqjd5                                  2/2     Running             2          111d
pod/telegraf-daemonset-dp2hb                                  2/2     Running             4          111d
pod/telegraf-daemonset-fhc9m                                  2/2     Running             6          111d
pod/telegraf-daemonset-hjqmj                                  2/2     Running             12         89d
pod/telegraf-daemonset-ljsxf                                  2/2     Running             4          111d
pod/telegraf-daemonset-njsxx                                  2/2     Running             4          89d
pod/telegraf-daemonset-vhhl8                                  2/2     Running             2          111d
pod/telegraf-deployment-6448f95b55-gn4ph                      1/1     Running             0          10d

NAME                                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                          ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   111d
service/metrics-server                                 ClusterIP   10.233.32.242   <none>        443/TCP                      105d
service/prometheus-operated                            ClusterIP   None            <none>        9090/TCP                     111d
service/prometheus-operator-alertmanager               ClusterIP   10.233.19.107   <none>        9093/TCP                     111d
service/prometheus-operator-grafana                    ClusterIP   10.233.42.175   <none>        80/TCP                       111d
service/prometheus-operator-kube-state-metrics         ClusterIP   10.233.34.37    <none>        8080/TCP                     111d
service/prometheus-operator-operator                   ClusterIP   10.233.57.170   <none>        8080/TCP,443/TCP             111d
service/prometheus-operator-prometheus                 ClusterIP   10.233.35.123   <none>        9090/TCP                     111d
service/prometheus-operator-prometheus-node-exporter   ClusterIP   10.233.36.118   <none>        9100/TCP                     111d
service/telegraf-deployment                            ClusterIP   10.233.37.51    <none>        9273/TCP                     111d

NAME                                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter   12        12        12      12           12          <none>          111d
daemonset.apps/telegraf-daemonset                             12        12        12      12           12          <none>          111d

NAME                                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/metrics-server                           1/1     1            1           105d
deployment.apps/prometheus-operator-grafana              1/1     1            1           111d
deployment.apps/prometheus-operator-kube-state-metrics   1/1     1            1           111d
deployment.apps/prometheus-operator-operator             1/1     1            1           111d
deployment.apps/telegraf-deployment                      1/1     1            1           111d

NAME                                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/metrics-server-85496d4f7c                           1         1         1       105d
replicaset.apps/prometheus-operator-grafana-588d549949              1         1         1       59d
replicaset.apps/prometheus-operator-grafana-5c86cf65f9              0         0         0       59d
replicaset.apps/prometheus-operator-grafana-7857784dcd              0         0         0       111d
replicaset.apps/prometheus-operator-kube-state-metrics-64d56fc9df   1         1         1       111d
replicaset.apps/prometheus-operator-operator-7fb8c9f85c             1         1         1       111d
replicaset.apps/telegraf-deployment-6448f95b55                      1         1         1       111d

NAME                                                             READY   AGE
statefulset.apps/alertmanager-prometheus-operator-alertmanager   0/1     111d
statefulset.apps/prometheus-prometheus-operator-prometheus       1/1     111d

NAME                                             COMPLETIONS   DURATION   AGE
job.batch/prometheus-operator-admission-create   1/1           5s         111d
job.batch/prometheus-operator-admission-patch    1/1           94s        111d

We are looking for

service/prometheus-server               ClusterIP   10.24.45.75    <none>        80/TCP     37d
service/prometheus-alertmanager         ClusterIP   10.24.32.99    <none>        80/TCP     37d

in prometheus-service, but those are not available.

bradmccoydev commented 2 years ago

@christian-kreuzberger-dtx FYI I am currently doing analysis on this, as I would like to use the operator also.

christian-kreuzberger-dtx commented 2 years ago

Sure! Please post your findings here! Looping in @thisthat and @oleg-nenashev on this change.

oleg-nenashev commented 2 years ago

+1. I will add it to my watch list for Keptn LTS

jheyduk commented 1 year ago

Is anybody working on this? I would give it a try.

bradmccoydev commented 1 year ago

My recommendation for this is that folks using the operator then they can BYO their own configuration and don't use the Keptn configure monitoring. And the get sli will work. I can present it at the developer meeting

ranyhb commented 1 year ago

My recommendation for this is that folks using the operator then they can BYO their own configuration and don't use the Keptn configure monitoring. And the get sli will work. I can present it at the developer meeting

so what do you suggest doing?