[h2olog] h2olog does not exit when the attaching h2o exits on Docker containers

gfx commented 3 years ago

h2olog should exit if the attaching h2o exists, as bpftrace does, by tracing a kernel tracepoint sched:sched_process_exit.

Unfortunately, the automatic exit does not work on Docker containers because BPF handlers cannot get container's pid (or the pid in the container namespace), but instead, they get the host's pid (i.e. bpf_get_current_task()->tgid in BPF is not the container's tgid but it seems the host's tgid). Thus the following condition if (!(h2o_pid == H2OLOG_H2O_PID && h2o_tid == H2OLOG_H2O_PID)) does not work on Docker containers (and probably on any processes in non-root namespaces).

https://github.com/h2o/h2o/blob/c3af80546954f89181b1aa50d1ae6ab247c28e83/src/h2olog/http_tracer.cc#L56-L62

So I think there are two approaches about it:

Translate a host pid into a container's pid, but IIUC it's difficult
- https://blogs.oracle.com/linux/translating-process-id-between-namespaces describes there's no simple, generic way to do so, so they are proposing a new syscall in July 2018.
  1. Watching the attaching h2o process in the same way as bpftrace does.
- https://github.com/iovisor/bpftrace/blob/66ebdcf3c0dba2623633b3fdac93233b683ea151/src/procmon.cpp

I'd like to take the (2), but let me know if have any other ideas.

kazuho commented 3 years ago

Do we have to fix the issue?

I might argue that the user of h2olog can kill h2olog when that person determines that h2o has been stopped.

gfx commented 3 years ago

We don't have to fix it at least right now. It doesn't affect our production use. We can always kill the h2olog process by hand, or with destructors in the E2E tests.

kazuho commented 3 years ago

Thank you for the clarifications. Let's leave it as is, then.

FWIW, I tend to think that this issue is about how to run CI on docker, rather than a bug that exists somewhere.

The purpose of containers is to provide isolation, including providing namespaces for PIDs. Docker, by default, provides that. But when we run CI, we intentionally create a loophole, by setting the --privileged option. That's why we have access to eBPF and the raw PID value. To paraphrase, this issue stems from the fact that we are using docker in an non-recommended, dangerous way. There's no need to clean up path for people taking the dangerous route.

gfx commented 3 years ago

Thank you for your opinion. It makes sense!

It would be nice if h2olog states warnings about it on its startup if it's running in a non-root namespace (I'm not sure it's possible, though), but I'll leave it as is until someone runs into this problem.

h2o / h2o

[h2olog] h2olog does not exit when the attaching h2o exits on Docker containers #2545