open-telemetry / opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF
https://opentelemetry.io
Apache License 2.0
560 stars 85 forks source link

Additional clean up consideration needed for SIGKILL/SIGTERM cases #408

Open thatsdone opened 1 year ago

thatsdone commented 1 year ago

Describe the bug

When opentelemetry-go-insrumentation was killed by SIGKILL or SIGTERM, there still remains stale process directories under /sys/fs/bpf.

If I killed go-instrumentation process via SIGINT, clean up routine are called and unnecessary directories disappear.

I know this fix, but it looks like there are some more conditions to be considered.

Environment

To Reproduce

Steps to reproduce the behavior:

  1. Run target go program to be auto-instrumented (and check the PID)
  2. Run go-instrumentation
  3. Do something using the target
  4. Check files under /sys/fs/bpf. There must be at least a directory with name of the PID above
  5. Kill the go instrumentation via SIGKILL or SIGTERM
  6. Check files under /sys/fs/bpf. You will still see the directory above.

Expected behavior

Clean up routine is called, and we don't see stale directories(and ebpf instrumentations).

Additional context

Below is the program I used when I noticed this issue. https://github.com/thatsdone/junkbox/blob/master/golang/goproxy.go

RonFed commented 1 year ago

From my tests it looks like there is a proper cleanup after SIGINT and SIGTERM (I used kill <pid of auto instrumentor>). Regarding SIGKILL, we can't really handle it using the current approach (https://pkg.go.dev/os/signal), so we might consider something else for it. @edeNFed @MrAlias

pellared commented 1 year ago

I do not think we should have any special handling for SIGKILL.

More: https://itecnote.com/tecnote/linux-sigkill-signal-handling/

I see this bug as invalid. @thatsdone am I missing anything or can we close the issue?

thatsdone commented 1 year ago

@RonFed I checked SIGTERM behavior again, and you are correct. Sorry.

@pellared

I do not think we should have any special handling for SIGKILL. https://itecnote.com/tecnote/linux-sigkill-signal-handling/

Granted that there is no way handling SIGKILL in user land, but in case of software with kernel land entities clean up processing can be done in close handler of file descriptors in general. I'm not sure if this is the case for opentelemetry-go-instrumentation, I agree to close this issue.

But anyway, I think it's better to add this topic to document something like:

"Do not stop opentelemetry-go-instrumentation by SIGKILL, use SIGINT or SIGTERM. Otherwise, stale directories remain under /sys/fs/bpf."