open-telemetry / opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF
https://opentelemetry.io
Apache License 2.0
503 stars 77 forks source link

Running otel-go-instrumentation in the same container as the app errors with "process not found yet, trying again soon" (ARM host, x86_64 container) #1141

Open mjneth opened 1 week ago

mjneth commented 1 week ago

Describe the bug

When running otel-go-instrumentation in the same container as the app it errors with "process not found yet, trying again soon".

The sidecar approach needing to run as privileged is a concern for us so I was seeing if this can work if otel-go-instrumentation is run in the same container as the app being instrumented.

Environment

To Reproduce

Dockerfile:

FROM golang:1.23.1-bullseye as base

RUN apt-get update && apt-get install -y curl vim net-tools procps wget clang gcc llvm make libbpf-dev

RUN wget https://github.com/open-telemetry/opentelemetry-go-instrumentation/archive/refs/tags/v0.14.0-alpha.tar.gz && \
  tar zxf v0.14.0-alpha.tar.gz && \
  cd opentelemetry-go-instrumentation-0.14.0-alpha/ && \
  make build && \
  cp otel-go-instrumentation /usr/local/bin

docker-compose.yaml:

version: '3.4'

services:
  app:
    platform: linux/amd64
    build:
      context: .
    #### See if these can be removed if it's running in the same container
    privileged: true
    pid: "host"
    cap_add:
      - SYS_PTRACE
    ####
    environment:
      OTEL_GO_AUTO_TARGET_EXE: /app/oteltest
      OTEL_SERVICE_NAME: otelgoautotest
      OTEL_EXPORTER_OTLP_ENDPOINT: http://localhost:4318
      OTEL_PROPAGATORS: tracecontext,baggage
    expose:
      - 8080
    ports:
      - 8080:8080
    volumes:
      - .:/app
    working_dir: /app

Test app to be instrumented:

package main

import (
  "fmt"
  "net/http"
)

func handler( w http.ResponseWriter, r *http.Request) {
  fmt.Fprintf(w, "received request")
}

func main() {
  http.HandleFunc("/otelgotest", handler)

  fmt.Println("Starting...")
  http.ListenAndServe(":8080", nil)
}

Run the app and otel-go-instrumentation in the same container:

  1. Run and exec into the container: docker-compose run app.
  2. Build the test app: go build ..
  3. Run the app in the background: /app/oteltest &.
  4. Run otel-go-instrumentation: otel-go-instrumentation --log-level=debug.

It fails to find the app process with the error:

{"level":"info","ts":1727709418.1301615,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:86","msg":"building OpenTelemetry Go instrumentation ...","globalImpl":false}
{"level":"debug","ts":1727709420.1934721,"logger":"Instrumentation.Analyzer","caller":"process/discover.go:71","msg":"process not found yet, trying again soon","exe_path":"/app/oteltest"}

Though the process can be seen by ps -ef: root 2394 2102 0 15:15 pts/0 00:00:00 /usr/bin/qemu-x86_64 /app/oteltest /app/oteltest.

I tried setting OTEL_GO_AUTO_TARGET_EXE to how it shows in ps -ef and some variations of that to see if it's an issue with being an ARM mac host machine and x86_64 container since the process output shows qemu-x86_64. I also tried running /app/oteltest in the container from one terminal and exec'd into the container a second time from another terminal to run otel-go-instrumentation and still had the same error.

Expected behavior

The otel-go-instrumentation binary is able to find the test app process, instrument it, and send traces to OTEL_EXPORTER_OTLP_ENDPOINT.

Additional context

I'm not sure if this is considered a supported flow or not but the sidecar approach requiring being run as privileged or needing additional capabilities is a concern. I'm curious if the stance here is to use the privileged sidecar approach or instrument your go app in the app-code instead and there isn't intended to be a way to auto-instrument without a privileged container.

RonFed commented 1 week ago

@mjneth Thank you for opening this issue. Can you try using an ARM image to see if the problem relates to qemu?

mjneth commented 1 week ago

@mjneth Thank you for opening this issue. Can you try using an ARM image to see if the problem relates to qemu?

It does work with an ARM image but our production images are x86_64 so we want to use the same in the docker-compose setups for local dev. Is it feasible for this to work with qemu?

RonFed commented 1 week ago

@mjneth From the ps output you attached, it seems that the actual process being run is qemu one which emulates the go binary, if that is the case then our current implementation won't support it since we are looking for a running go executable.