headlamp-k8s / headlamp

A Kubernetes web UI that is fully-featured, user-friendly and extensible
https://headlamp.dev
Apache License 2.0
1.87k stars 141 forks source link

Headlamp cluster metrics are not showing the proper values #2043

Open mariogkds opened 1 month ago

mariogkds commented 1 month ago

Hello, i am a new user, i really liked the project.

I am having some problems with the cluster wide metrics that are show on the dashboard:

image

I am using kube-prometheus-stack to handle prometheus and grafana and i am using prometheus-adapter for the metrics API.

To get the headlamp to even show anything i had to add a few settings to the chart's values:

kube-prometheus-stack

    kubelet:
      serviceMonitor:
        metricRelabelings:
          - action: replace
            sourceLabels:
              - node
            targetLabel: instance
    prometheus-node-exporter:
      prometheus:
        monitor:
          attachMetadata:
            node: true
          relabelings:
            - sourceLabels:
                - __meta_kubernetes_endpoint_node_name
              targetLabel: node
              action: replace
              regex: (.+)
              replacement: ${1}
          metricRelabelings:
            - action: replace
              regex: (.*)
              replacement: $1
              sourceLabels:
                - __meta_kubernetes_pod_node_name
              targetLabel: kubernetes_node

prometheus-adapter (which is normal to get the metrics apis)

      resource:
        cpu:
          containerQuery: |
            sum by (<<.GroupBy>>) (
              rate(container_cpu_usage_seconds_total{container!="",<<.LabelMatchers>>}[3m])
            )
          nodeQuery: |
            sum  by (<<.GroupBy>>) (
              rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal",<<.LabelMatchers>>}[3m])
            )
          resources:
            overrides:
              node:
                resource: node
              namespace:
                resource: namespace
              pod:
                resource: pod
          containerLabel: container
        memory:
          containerQuery: |
            sum by (<<.GroupBy>>) (
              avg_over_time(container_memory_working_set_bytes{container!="",<<.LabelMatchers>>}[3m])
            )
          nodeQuery: |
            sum by (<<.GroupBy>>) (
              avg_over_time(node_memory_MemTotal_bytes{<<.LabelMatchers>>}[3m])
              -
              avg_over_time(node_memory_MemAvailable_bytes{<<.LabelMatchers>>}[3m])
            )
          resources:
            overrides:
              node:
                resource: node
              namespace:
                resource: namespace
              pod:
                resource: pod
          containerLabel: container
        window: 3m

Individual node's CPU values are correct, the memory value is correct as well but the unit is different: image

image

Is this a headlamp problem or this a prometheus(me) problem?

Thanks for the help and the project have a nice day.

joaquimrocha commented 1 month ago

Hi @mariogkds . Thanks for the report. This looks like a unit conversion issue. We will take a look.