open-telemetry / opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF
https://opentelemetry.io
Apache License 2.0
521 stars 80 forks source link

CreateHeaderFields probe fails with permission denied in v0.2.2-alpha #237

Open pl4nty opened 1 year ago

pl4nty commented 1 year ago

Describe the bug

Running registry.gitlab.com/gitlab-org/gitlab-runner:alpine-v16.2.0 with go autoinstrumentation causes the instrumentation container to crashloop. This issue doesn't occur with v0.2.1-alpha. Another workload on the same node (ghcr.io/toboshii/hajimari:v0.3.1) does not have this issue.

Environment

To Reproduce

Steps to reproduce the behavior:

  1. Deploy OTel Operator v0.81.0 with autoinstrumentation-go:v0.2.2-alpha
  2. Add autoinstrumentation annotations to target deployment
  3. Observe crashloop of opentelemetry-auto-instrumentation container with the following error
  4. Revert to autoinstrumentation-go:v0.2.1-alpha and observe working instrumentation
{
  "level":"error",
  "ts":1690190562.3462915,
  "caller":"instrumentors/runner.go:88",
  "msg":"error while loading instrumentors, cleaning up",
  "name":"google.golang.org/grpc",
  "error":"field UprobeHttp2ClientCreateHeaderFields: program uprobe_Http2Client_CreateHeaderFields: load program: permission denied: ; u32 random = bpf_get_prandom_u32();: 892: ( (truncated, 1151 line(s) omitted)",
  "stacktrace":"go.opentelemetry.io/auto/pkg/instrumentors.(*Manager).load\n\t/app/pkg/instrumentors/runner.go:88\ngo.opentelemetry.io/auto/pkg/instrumentors.(*Manager).Run\n\t/app/pkg/instrumentors/runner.go:36\nmain.main\n\t/app/cli/main.go:86\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"
}

Expected behavior

The app runs and is instrumented, as with v0.2.1-alpha

Additional context

Similar to #78

pl4nty commented 1 year ago

I've replicated with gitlab-runner:ubuntu-v16.2.0, but can't test on my amd64 nodes (they've crashlooped with unknown func bpf_probe_write_user for a while)

SzymonSt commented 1 year ago

Hello @pl4nty I've also encountered issue you have on your gitlab-runner:ubuntu-v16.2.0 with bpf_probe_write_user bpf helper. It seems that this is caused by the fact that since 5.14-rc6 linux kernel Commit bpf_probe_write_user is locked down for the sake of securtity and better solutions as mentionsed in commit description These days we have better mechanisms in BPF for achieving the same (e.g. for load-balancers), but without having to write to userspace memory. I think this should be separate issue as the root cause seems to be different and this bpf particular bpf helper should be retired as this issue forces users to disable lockdown and integrity modes in lsm kernel parameter which is both hard to do in cloud providers VMs and unsecure as far as I know. I will create another issue and link yours.

MrAlias commented 3 months ago

We added a graceful degradation for the HTTP instrumentation, where it will log an error if the kernel is locked down. We can do the same for gRPC as well.

There is not much more that we can do other than what is being looked at here to address this more comprehensively: https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/290