grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.43k stars 211 forks source link

"neither musl nor glibc found" error with java pyroscope on bottlerocket #1638

Open eric-engberg opened 2 months ago

eric-engberg commented 2 months ago

What's wrong?

I'm getting the following error when trying to configure alloy to profile java processes. I have checked and glibc is installed on the bottlerocket host as well as in the java container. The java binary is also linked to glibc.

2024-09-09T21:51:22.173548542Z stderr F ts=2024-09-09T21:51:22.17339853Z level=error component_path=/ component_id=pyroscope.java.java pid=1651708 err="failed to select dist for pid 1651708: failed to select dist for pid 1651708: neither musl nor glibc found"

Steps to reproduce

  1. Have a bottlerocket node with a java process for profiling.
  2. Install alloy as daemonset with hostPID true and the following security context per https://github.com/grafana/alloy/issues/1616#issuecomment-2337725701
    securityContext:
    runAsUser: 0
    runAsNonRoot: false
    capabilities:
      add:
        - PERFMON
        - SYS_PTRACE
        - SYS_RESOURCE
        - SYS_ADMIN

    System information

Bottlerocket Linux ip-10-30-49-246.ec2.internal 6.1.97 #1 SMP PREEMPT_DYNAMIC Fri Jul 26 23:04:30 UTC 2024 x86_64 GNU/Linux

Software version

1.3.1

Configuration

discovery.process "all" {
  refresh_interval = "60s"

  discover_config {
    cwd          = true
    exe          = true
    commandline  = true
    username     = true
    uid          = true
    container_id = true
  }
}

discovery.relabel "java" {
  targets = discovery.process.all.targets

  rule {
    action        = "keep"
    regex         = ".*/java$"
    source_labels = ["__meta_process_exe"]
  }
}

pyroscope.java "java" {
  targets    = discovery.relabel.java.output
  forward_to = [pyroscope.write.pyroscope.receiver]

  profiling_config {
    interval    = "60s"
    alloc       = "512k"
    cpu         = true
    sample_rate = 100
    lock        = "10ms"
    event       = "wall"
    per_thread  = true
  }
}

Logs

2024-09-09T21:51:22.173548542Z stderr F ts=2024-09-09T21:51:22.17339853Z level=error component_path=/ component_id=pyroscope.java.java pid=1651708 err="failed to select dist for pid 1651708: failed to select dist for pid 1651708: neither musl nor glibc found"

ptodev commented 1 month ago

Hello! It seems like the Alloy component goes through the /proc/[pid]/maps file and the /proc/[pid]/root/lib directory. Do you see any mention of glibc or musl in those locations? Maybe we could add extra locations in the code.

@korniltsev I wonder if the code could use more substantial refactoring. I'm not sure how reliable it is to go through hardcoded locations.

eric-engberg commented 1 month ago

I do not see it in either the container or the bottlerocket host.

ptodev commented 1 month ago

Could it be that your Java application uses neither glibc nor musl? I'm not sure how to find out what it is using though.