grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.48k stars 224 forks source link

Profiling issue for Java pods with auto-instumentation method #2193

Open Vaibhav-1995 opened 4 days ago

Vaibhav-1995 commented 4 days ago

What's wrong?

I am using Java Profiling with Alloy (auto-instrumentation method) for enabling profiling on java pods wthin cluster. Deployed pyroscope and alloy separately using helm chart and have added below config in alloy configmap for java profiling as provided on below link -

https://github.com/grafana/pyroscope/tree/main/examples/grafana-agent-auto-instrumentation/java/kubernetes

But profiling starts on only few random java pods and not on all java pods. Not able to identify that why profiling is not enabled on all java pods.

Steps to reproduce

As per documentation on below link all pre-requisites are done at alloy end in helm chart - but still only in some pods pods profiling has started (refer images below)

https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.java/

System information

Linux 5.10.223-212.873.amzn2.aarch64

Software version

Grafana Alloy v1.4.2

Configuration

content:  |

      logging {
        level  = "debug"
        format = "logfmt"
      }

      // Discovers all kubernetes pods.
      // Relies on serviceAccountName=grafana-alloy in the pod spec for permissions.
      discovery.kubernetes "pods" {
        role = "pod"
      }

      // Discovers all processes running on the node.
      // Relies on a security context with elevated permissions for the alloy container (running as root).
      // Relies on hostPID=true on the pod spec, to be able to see processes from other pods.
      discovery.process "all" {
        // Merges kubernetes and process data (using container_id), to attach kubernetes labels to discovered processes.
        join = discovery.kubernetes.pods.targets
      }
      // Drops non-java processes and adjusts labels.    
      discovery.relabel "java" {
        targets = discovery.process.all.targets
        // Drops non-java processes.
        rule {
          source_labels = ["__meta_process_exe"]
          action = "keep"
          regex = ".*/java$"
        }
        // Sets up the service_name using the namespace and container names.
        rule {
          source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
          target_label = "service_name"
          separator = "/"
        }
        // Sets up kubernetes labels (labels with the __ prefix are ultimately dropped).
        rule {
          action = "replace"
          source_labels = ["__meta_kubernetes_pod_node_name"]
          target_label = "node"
        }
        rule {
          action = "replace"
          source_labels = ["__meta_kubernetes_namespace"]
          target_label = "namespace"
        }
        rule {
          action = "replace"
          source_labels = ["__meta_kubernetes_pod_name"]
          target_label = "pod"
        }
        rule {
          action = "replace"
          source_labels = ["__meta_kubernetes_pod_container_name"]
          target_label = "container"
        }
        // Sets up the cluster label.
        // Relies on a pod-level annotation with the "cluster_name" name.
        // Alternatively it can be set up using external_labels in pyroscope.write. 
        rule {
          action = "replace"
          source_labels = ["__meta_kubernetes_pod_annotation_cluster_name"]
          target_label = "cluster"
        }
      }

      // Attaches the Pyroscope profiler to the processes returned by the discovery.relabel component.
      // Relies on a security context with elevated permissions for the alloy container (running as root).
      // Relies on hostPID=true on the pod spec, to be able to access processes from other pods.
      pyroscope.java "java" {
        profiling_config {
          interval = "15s"
          alloc = "512k"
          cpu = true
          lock = "10ms"
          sample_rate = 100
        }
        forward_to = [pyroscope.write.local.receiver]
        targets = discovery.relabel.java.output
      }

      pyroscope.write "local" {
        // Send metrics to the locally running Pyroscope instance.
        endpoint {
          url = "http://xxx-xxx-pyroscope-distributor.observability-pyroscope-dev.svc.cluster.local:4040"
        }
        external_labels = {
          "static_label" = "static_label_value",
        }
      }

Logs

ts=2024-12-02T04:58:02.01712108Z level=error component_path=/ component_id=pyroscope.java.java pid=716979 err="failed to start: asprof failed to run: asprof failed to run /tmp/alloy-asprof-glibc-ed25bbf0083bff602254601eb6c4a927823d988f/bin/asprof: exit status 255 Target JVM failed to load /tmp/alloy-asprof-glibc-ed25bbf0083bff602254601eb6c4a927823d988f/bin/../lib/libasyncProfiler.so\n"

Vaibhav-1995 commented 1 day ago

Hi Team, Any update on above issue?

simonswine commented 23 hours ago

I don't think we have enough information to really suggest anything yet. Just a few pointers which might help us:

In general Alloy needs to run on every Kubernetes that you want to profile (using a daemonset), maybe you can share the full values.yaml from helm.