grafana / pyroscope-rs

Pyroscope Profiler for Rust. Profile your Rust applications.
Apache License 2.0
132 stars 22 forks source link

Expose running py-spy sampling in blocking mode #94

Open AlexBoyd opened 1 year ago

AlexBoyd commented 1 year ago

Currently when configuring the py-spy backend under the hood, we always initialize py-spy with locking mode non-blocking. https://github.com/pyroscope-io/pyroscope-rs/blob/main/pyroscope_ffi/python/lib/src/lib.rs#L77

Running spy-spy in non blocking mode has less overhead, but will frequently produce broken partial stack traces when taking a sample on larger applications with large stacks. Py-spy itself now runs with this by default to avoid the issue causes by incorrectly captured stack traces possible in non-blocking mode.

This option was previously toggle-able when using the pyroscope-cli connect options, but has never been exposed when using the native python support.

Suggestion would be to expose a new config option in configure https://github.com/pyroscope-io/pyroscope-rs/blob/main/pyroscope_ffi/python/pyroscope/__init__.py#L12, pass that down to initialize agent, https://github.com/pyroscope-io/pyroscope-rs/blob/main/pyroscope_ffi/python/pyroscope/__init__.py#L36 and then pass that into the construction of the py-spy backend config.

benfred commented 1 year ago

I wrote a post talking about why py-spy has blocking as the default here https://www.benfrederickson.com/why-python-needs-paused-during-profiling/ - nonblocking is great for reducing perf impact, but can have very misleading results

AlexBoyd commented 1 year ago

I've attempted to test this change locally, but I do seem to be hitting an error from the underlying py-spy sampling failing to suspend the process without any other error logs.

If I run with logging enabled I see this soon after start up: WARN py_spy::sampler > Failed to profile python from process 18: Failed to suspend process

Oddly running py-spy directly both in blocking or no blocking mode via the py-spy cli tooling works just fine. @benfred Do you know of any way I get can anything more out of the py-spy error messages, to get the reason was on the failed process.lock call when run this pyroscope support? I wonder if there's some issue where it's impossible for code running inside of the python process to suspend itself (even via the redirection out to rust).

omarabid commented 1 year ago

@AlexBoyd I don't think locking is possible under FFI. You are basically asking a child process to lock a parent process. Not sure how the unlocking will happen later and this is not permitted by Linux anyway.

AlexBoyd commented 1 year ago

Hmm any suggestions for the best way to work around this limitation in the case. For every meaningfully large python program I've tried to profile with pyroscope so far, about 20-30% of the total time ends up being attributed in these partial / broken stack traces.

It looks like the pyroscope standalone agent, also dropped support for py-spy over here: https://github.com/grafana/pyroscope/pull/1608 Running that with connect would previously have been a workaround to get blocking samples working. Is there any remaining way to run a pyroscope agent using the py-spy backend?

omarabid commented 1 year ago

@AlexBoyd You can try the CLI of pyroscope-rs, it's the equivalent agent in Rust (the other, I think, is a mix of Go/C). This will make tagging impossible, however.