plasma-umass / scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Apache License 2.0
12.16k stars 399 forks source link

add discussion of other profilers mentioned on Hacker News to README.md #1

Closed emeryberger closed 4 years ago

emeryberger commented 4 years ago

https://github.com/benfred/py-spy

https://github.com/vpelletier/pprofile

https://pyflame.readthedocs.io/en/latest/installation.html

gpshead commented 4 years ago

I was surprised to not see https://github.com/vmprof/vmprof-python listed.

halfhorst commented 4 years ago

pyflame is also deprecated and archived, which is an important consideration

emeryberger commented 4 years ago

Thanks @gpshead for the pointer!

https://github.com/vmprof/vmprof-python

chiragjn commented 4 years ago

Comparision with https://github.com/joerick/pyinstrument (cpu only) would also be great. Probably scalene can also add a contextmanager to wrap some python code and run it inline in some large codebase / jupyter notebooks

emeryberger commented 4 years ago

https://github.com/joerick/pyinstrument:

And all of the above do not distinguish between time spent in Python vs. time spent in C (which Scalene now does).

pfreixes commented 4 years ago

Hi, maybe this is not the place of having this discussion but I have an outstanding question about how scalene is doing the profiling and how similar is this technique compared to other profilers.

Reading at the code seems that scalene is attributing a sample [1] to the specific line of code that was interrupted. Where an interval can be implicitly seen as a slice of 100ms of CPU. How fair is this assumption considering that other lines of code could be executed within that slice?

Most likely I'm missing something, but the way of scalene is doing the profiling is more about using statistics rather than instrumentalizing the code. So, instead of instrumenting everything and getting all of the elapsed times for all of the functions, Scalene is extrapolating the usage of the CPU by considering the number of times that this line of code has been interrupted. Am I wrong?

If this is true, I'm wondering how accurate is this profiling compared to other traditional tools like profile [2]. Wha would be in your opinion the main differences?

On the other way, the code claims [3] that due to the internals of CPython can not be possible to deliver a signal till the code path reaches again the byte-interpreter with that can be inferred the time that the sample was spent in the C extension. How does it work when signals are triggered during the byte-code execution and having in between calls to C functions?

[1] https://github.com/emeryberger/scalene/blob/master/scalene/scalene.py#L134 [2] https://docs.python.org/3/library/profile.html#module-profile [3] https://github.com/emeryberger/scalene/blob/master/scalene/scalene.py#L133

emeryberger commented 4 years ago

@pfreixes: Scalene is indeed a statistical profiler (https://en.wikipedia.org/wiki/Profiling_(computer_programming)#Statistical_profilers) and does not instrument code. This is mostly an advantage. Sampling can be both more accurate and faster than instrumentation.

Statistical profilers can be almost arbitrarily accurate, given enough samples (appropriately distributed and at a high enough frequency - for some mathematical background, see https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem and https://en.wikipedia.org/wiki/Margin_of_error#Calculations_assuming_random_sampling).

To track CPU usage, Scalene uses random sampling at a rate (currently) of one sample every hundred seconds (100Hz); its accuracy (like all sampling) increases with the square root of the number of samples taken. The longer your program runs, the more accurate Scalene gets.

Sampling is not only faster than instrumentation (as done by traditional profilers), which can slow down code considerably. It also has the advantage of avoiding the "probe effect", where the instrumentation introduces a form of bias that skews the results (so that the profiling results may not actually hold for the original program). By contrast, sampling is always testing the original program.

To answer your second question, the code now contains a detailed explanation of how Scalene attributes time to code (briefly, delays in the delivery of signals can only arise due to execution of C code outside the interpreter). See https://github.com/emeryberger/scalene/blob/master/scalene/scalene.py#L138.

emeryberger commented 4 years ago

@chiragjn: https://github.com/emeryberger/scalene/commit/a7afa197e5e3183e2ff9ba1e8ed36cd68bb8dce1 adds pyinstrument and two variants of yappi.

emeryberger commented 4 years ago

https://github.com/emeryberger/scalene/commit/565256e60ab2e1118ed42b2bdb58065bd9c2cd1e adds pprofile (https://github.com/vpelletier/pprofile).

emeryberger commented 4 years ago

Added py-spy (https://github.com/benfred/py-spy). Leaving off pyflame since it's deprecated and unsupported. Leaving off vm-prof since I can't get it to run on OS X.