grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.22k stars 149 forks source link

`prometheus.exporter.cadvisor` component (flow mode) not working as expected #277

Open guillermotti opened 7 months ago

guillermotti commented 7 months ago

What's wrong?

The river configuration for prometheus.exporter.cadvisor is not able to get container metrics such us container_memory_working_set_bytes.

Slack thread: https://grafana.slack.com/archives/C01050C3D8F/p1704886998073399

Steps to reproduce

Having EKS v1.25.16-eks-8cb36c9 with containerd and Grafana Agent v0.38.0, using the following river config:

prometheus.exporter.cadvisor "cadvisor" {}

prometheus.scrape "cadvisor" {
  targets    = prometheus.exporter.cadvisor.cadvisor.targets
  forward_to = [prometheus.remote_write.mimir.receiver]
}

The data coming from the metric container_memory_working_set_bytes is just for the k8s nodes, not related to any container running inside the cluster.

But changing the river config to:

prometheus.operator.servicemonitors "prometheus" {
  forward_to = [prometheus.remote_write.mimir.receiver]
  namespaces = ["monitoring"]
  selector {
    match_expression {
        key = "app"
        operator = "In"
        values = ["prometheus-operator-kubelet"]
      }
    }
}

And having a ServiceMonitor which contains the following:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus-operator
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2020-03-06T12:30:48Z"
  generation: 1
  labels:
    app: prometheus-operator-kubelet
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-operator-8.12.14
    heritage: Helm
    prometheus: kube-prometheus
    release: prometheus-operator
  name: prometheus-operator-kubelet
  namespace: monitoring
  resourceVersion: "469438187"
  uid: 4f2594bd-5fa6-11ea-b24b-0ac4da84e0e4
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    honorLabels: true
    path: /metrics/cadvisor
    port: https-metrics
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kubelet

All the container metrics are coming successfully.

I expect to have the same behavior just using the river component for prometheus.exporter.cadvisor and without the need to rely on a serviceMonitor to scrape metrics from containers using cAdvisor.

Software version

Grafana Agent v0.38.0

Configuration

prometheus.exporter.cadvisor "cadvisor" {}

prometheus.scrape "cadvisor" {
  targets    = prometheus.exporter.cadvisor.cadvisor.targets
  forward_to = [prometheus.remote_write.mimir.receiver]
}
github-actions[bot] commented 6 months ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!

rfratto commented 4 months ago

Hi there :wave:

On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.

To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)

R-Studio commented 4 months ago

@guillermotti any news? I have the same issue with RKE2 (Rancher) and I am not able to use a ServiceMonitor because Kubelet is not a pod.

WhiteDiamondz commented 2 months ago

This makes the prometheus.exporter.cadvisor component way less useful unfortunately :/

In case it helps we were able to reproduce this behavior on an EKS cluster.

oba11 commented 2 months ago

I managed to find a workaround to retrieve cadvisor metrics such as container_memory_working_set_bytes by using below configuration in alloy

discovery.kubernetes "node" {
  role = "node"
}

discovery.relabel "k8s_node_cadvisor" {
  targets    = discovery.kubernetes.node.targets
  rule {
    action = "labelmap"
    regex = "__meta_kubernetes_node_label_(.+)"
  }
  rule {
    action = "replace"
    target_label = "__address__"
    replacement = "kubernetes.default.svc:443"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_node_name"]
    regex = "(.+)"
    target_label = "__metrics_path__"
    replacement = "/api/v1/nodes/${1}/proxy/metrics/cadvisor"
  }
}

prometheus.scrape "k8s_node_cadvisor" {
  targets    = discovery.relabel.k8s_node_cadvisor.output
  honor_labels = true
  scheme = "https"
  bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
  tls_config {
    ca_file = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
  }
  forward_to = [prometheus.remote_write.mimir.receiver]
}