litmuschaos / litmus-helm

Helm Charts for the Litmus Chaos Operator & CRDs
Apache License 2.0
45 stars 85 forks source link

ServiceMonitor misconfiguration #357

Open VLZZZ opened 7 months ago

VLZZZ commented 7 months ago

Hi! I've just found that litmus ServiceMonitor for metrics collection seems to be misconfigured
I deep dived a bit and it looks like ServiceMonitor selector:

spec:
  endpoints:
  - path: /metrics
    port: http
  namespaceSelector:
    matchNames:
    - litmus
  selector:
    matchLabels:
      app: litmus
      app.kubernetes.io/instance: litmus-chaos
      app.kubernetes.io/name: litmus

which is resulted in litmus-monitor service (kubernetes service entity).
That has a Pod selector:

  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app: litmus
    app.kubernetes.io/instance: litmus-chaos
    app.kubernetes.io/name: litmus

But this selector covers both litmus and litmus-monitor pods.

But only litmus-monitor-* has port named http at 8080 and serves prometheus metrics While litmus-* pod has http at 80 with no prometheus metrics. (litmus pods can server default go metrics at 8080 but it's a different story and I don't think that we need this)

    name: chaos-operator
    ports:
    - containerPort: 80
      name: http
      protocol: TCP

I believe we need to narrow selector to match litmus-monitor only.

Or to allow additionalLables for the selector

  1. https://github.com/litmuschaos/litmus-helm/blob/31a0a30a370c64de6227bfb0ff10035c4b[…]dcd8c/charts/litmus-core/templates/exporter-servicemonitor.yaml
  2. https://github.com/litmuschaos/litmus-helm/blob/31a0a30a370c64de6227bfb0ff10035c4b2dcd8c/charts/litmus-core/templates/_helpers.tpl#L52
Calvinaud commented 6 months ago

Hello,

I create a PR for this. In the litmus-agent, the serviceMonitor and Service only select the exporter pods. Make sense the litmus-core should do the same. If we need to retrieve the metrics of the operator we probably need to add a podMonitor instead.