plasma-umass / scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Apache License 2.0
12.22k stars 399 forks source link

Questions about the profiler comparison chart #423

Open pablogsal opened 2 years ago

pablogsal commented 2 years ago

Hi,

Disclaimer: I am one of the maintainers of the memray profiler

I am having a lot of trouble reproducing the profiler comparison chart, you show in the readme:

profiler-comparison

https://raw.githubusercontent.com/plasma-umass/scalene/master/docs/images/profiler-comparison.png

The README mentions that the chart was generated by:

Slowdown: the slowdown when running a benchmark from the Pyperformance suite. Green means less than 2x overhead. Scalene's overhead is just a 35% slowdown.

I have several comments/questions:

Scalene: python -m scalene --cli pystone.py >/dev/null Time (mean ± σ): 2.774 s ± 0.043 s [User: 2.537 s, System: 0.211 s] Range (min … max): 2.703 s … 2.836 s 1000 runs

memray without python allocators: memray run -fo /dev/null pystone.py >/dev/null Time (mean ± σ): 3.080 s ± 0.041 s [User: 3.044 s, System: 0.062 s] Range (min … max): 3.007 s … 3.124 s 1000 runs

memray with python allocators: memray run --trace-python-allocators -fo /dev/null pystone.py >/dev/null Time (mean ± σ): 3.505 s ± 0.056 s [User: 3.339 s, System: 0.055 s] Range (min … max): 3.410 s … 3.602 s 1000 runs

Fil: fil-profile --no-browser run pystone.py >/dev/null Time (mean ± σ): 4.247 s ± 0.068 s [User: 4.570 s, System: 2.144 s] Range (min … max): 4.136 s … 4.340 s 1000 runs


This makes scalene 1.56 times slower, `memray` 1.98 times slower and `fil` 2.39 times slower. If you don't trace the Python allocators in `memray` (the default) the difference is even smaller: 1.7x times slower. This of course is still not a fair comparison because `scalene` is a sampling profiler and `fil` and `memray` are tracing. Not only that, the 3 profilers record different information.

I couldn't force `scalene` into the realm of tracing profilers because `scalene` crashes:

$ python -m scalene --allocation-sampling-window=1024 --cli pystone.py > /dev/null File "/home/pablogsal/.pyenv/versions/3.10.1/lib/python3.10/site-packages/scalene/scalene_profiler.py", line 460, in malloc_signal_handler Scalene.enter_function_meta(this_frame, Scalene.__stats) File "/home/pablogsal/.pyenv/versions/3.10.1/lib/python3.10/site-packages/scalene/scalene_profiler.py", line 1061, in enter_function_meta @staticmethod File "/home/pablogsal/.pyenv/versions/3.10.1/lib/python3.10/site-packages/scalene/scalene_profiler.py", line 460, in malloc_signal_handler Scalene.enter_function_meta(this_frame, Scalene.__stats) File "/home/pablogsal/.pyenv/versions/3.10.1/lib/python3.10/site-packages/scalene/scalene_profiler.py", line 1061, in enter_function_meta @staticmethod RecursionError: maximum recursion depth exceeded Scalene error: received signal SIGSEGV



* There is no mention of what version of every profiler was used to do the experiments. Many of these profilers have become faster over time.
* There are also some incorrect checkmarks: `memray` runs over unmodified code, can follow multiple processes, it can detect leaks and also can report RSS.

Thanks in advance for considering these points and thank you for the great work you do with `scalene` and other fantastic `plasma-umass` packages. ❤️  
godlygeek commented 2 years ago

Disclaimer: I'm the other Memray maintainer 😄

  • There are also some incorrect checkmarks: memray runs over unmodified code, can follow multiple processes, it can detect leaks and also can report RSS.

Does the "lines or functions" column mean whether file+line number are captured vs name of the function being executed? If so, that ought to say "both" for Memray - you can mouse over frames in our flamegraph to see the function name, for example.

Additionally, I'd suggest that the "Python vs. C Time", "System Time", and "GPU" columns all really ought to say "n/a" instead of "-" for the 3 profilers listed in the "memory-only profilers" section, since all 3 of those columns are about time-based profiling, and don't make any sense for a profiler that doesn't profile time at all. Likewise for the "Python vs C memory", "memory trends", "copy volume", and "detects leaks" columns for the CPU-only profilers.

jpmckinney commented 3 months ago

Another useful column would be whether the profiler can attach to a running process, like Austin, py-spy, psrecord, memray.

To my knowledge, scalene doesn’t “attach” to processes, and --pid is only used to pause/resume profiling of a scalene process.