vmprof / vmprof-python

vmprof - a statistical program profiler
https://vmprof.readthedocs.io
Other
432 stars 55 forks source link

SIGALRM not always delivered to main thread in practice #232

Open oremanj opened 3 years ago

oremanj commented 3 years ago

I'm using vmprof in real-time mode to profile the main thread of a Linux application with a number of additional non-Python threads (created by various C libraries we're using). I've found that, contrary to the comments in sigprof_handler(), the SIGALRM is occasionally delivered to a thread other than the main thread. When this occurs, it results in the loss of a useful sample as well as spurious stderr about not being able to find a thread state.

Relevant man page quotes (emphasis added):

setitimer(2):

When a timer expires, a signal is generated for the calling process [...]

signal(7):

A signal may be process-directed or thread-directed. A process-directed signal is one that is targeted at (and thus pending for the process as a whole. A signal may be process-directed because it was generated by the kernel for reasons other than a hardware exception, or because it was sent using kill(2) or sigqueue(3). A thread-directed signal is one that is targeted at a specific thread. [...] A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked. If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which to deliver the signal.

I think a sufficient fix would replace

        if (is_main_thread() && broadcast_signal_for_threads()) {

with

        if (info->si_code != SI_TKILL && broadcast_signal_for_threads()) {

That is, if the signal wasn't generated by tgkill, then forward it to all other threads before potentially processing it ourselves. The forwarding will use tgkill so this doesn't produce an infinite loop. (Note that the code is not always SI_TIMER in the timer case; I've seen SI_KERNEL also.)

fijal commented 3 years ago

Hi

Thanks for the detailed description of the problem. I'm happy to accept a PR that fixes that