tikv / pprof-rs

A Rust CPU profiler implemented with the help of backtrace-rs
Apache License 2.0
1.32k stars 106 forks source link

Signal handler is unsafe #36

Open umanwizard opened 4 years ago

umanwizard commented 4 years ago

In perf_signal_handler, backtrace::trace_unsychronized is called. This will not produce any bugs if the user is just using pprof-rs, since a lock is taken, so the main body of perf_signal_handler cannot be executed more than once at a time.

However, if the user is calling backtrace::trace from any other part of the code at the same time, this will result in UB.

I suspect (but I'm not sure) that this is why we are seeing deadlocks in https://github.com/MaterializeInc/materialize when using both jemalloc heap profiling and pprof-rs profiling at the same time.

YangKeao commented 4 years ago

Yes. I have mentioned this in README. (oops, it seems not clear enough)

Unfortunately, there is no 100% robust stack tracing method. Some related researches have been done by gperftools. pprof-rs uses backtrace-rs which finally uses libunwind provided by libgcc

WARN: as described in former gperftools documents, libunwind provided by libgcc is not signal safe.

libgcc's unwind method is not safe to use from signal handlers. One particular cause of deadlock is when profiling tick happens when the program is propagating thrown exception.

If the signal arrives while the program is getting backtrace (through libgcc) (for sampling, profiling, error handling...), the result is hard to predict (sometimes will crash directly). A possible solution (in my imagination :smile_cat: ) is to scan and find the address of libgcc. In the signal handler, we can judge whether the context (register rip) is in libgcc's part. If it is, pprof-rs can skip this sampling. But as I am busy with other projects, I have no time to try this method these days :disappointed: .

But it's also not 100% perfect because libgcc's unwind can call other libraries, it's hard to tell whether the current context is in a calling process of unwind without getting backtrace.

umanwizard commented 4 years ago

Thank you for the detailed response. I think the best solution is just to turn off other things that might be getting the backtrace (e.g. jemalloc) while using Pprof-rs.