tmm1 / stackprof

a sampling call-stack profiler for ruby 2.2+
MIT License
2.09k stars 128 forks source link

Forward SIGALRM to original thread #192

Closed jhawthorn closed 1 year ago

jhawthorn commented 1 year ago

In "wall" mode, the SIGALRM signal will arrive at an arbitrary thread. In order to provide more useful results, especially under threaded web servers, we want to forward this signal to the original thread StackProf was started from.

According to POSIX.1-2008 TC1 pthread_kill and pthread_self should be async-signal-safe.

I tested locally and mode: :cpu doesn't seem to have this problem, signals arrive on threads proportional to their CPU usage.

cc @tenderlove

casperisfine commented 1 year ago

What will stackprof report if the target_thread is paused (not on IO but because another thread took its turn)?

ivoanjo commented 1 year ago

Note that this means that under wall-clock + when using stackprof_buffer_sample from the signal handler directly, only one thread will ever be sampled (the target_thread).

Also, I'm not entirely sure that it's safe to call stackprof_buffer_sample/rb_postponed_job_register_one from a thread not holding the GVL.

casperisfine commented 1 year ago

only one thread will ever be sampled (the target_thread).

Depending on what you're looking for, that might actually be desirable. E.g. I want to profile 1 request in a Puma server, I'd rather not see what other threads are doing.

However in such a "single-thread" focused profiling, you don't want to report time if the thread is waiting for the GVL.

jhawthorn commented 1 year ago

What will stackprof report if the target_thread is paused (not on IO but because another thread took its turn)?

It will report wherever it was, same as before.

Also, I'm not entirely sure that it's safe to call stackprof_buffer_sample/rb_postponed_job_register_one from a thread not holding the GVL.

I don't believe there's any reason to think this (especially for rb_postponed_job_register_one). Safe is relative here for rb_profile_frames, but it works (if anything it's probably safer since there's no chance we'll catch it in the middle of vm_push_frame).

In any case this change just makes behaviour consistent which was previously arbitrary. There's nothing about the previous behaviour that would make us receive signals in the GVL holding thread. If there is something unsafe about this (I don't believe there is) then it was always unsafe.

In my experience previously this always reported one thread, but it was an arbitrary thread. I think if you were happy with your previous stackprof results this likely didn't change them.

casperisfine commented 1 year ago

I think if you were happy with your previous stackprof results this likely didn't change them.

Yes, I wasn't suggesting this isn't an improvement, it just made me better realize some potential bias when using stackprof in a multi-theaded environment.

Thanks for the change, it's definitely positive.