pyutils / line_profiler

Line-by-line profiling for Python
Other
2.57k stars 118 forks source link

Line profiling in Cython seems to be totally broken #200

Open battaglia01 opened 1 year ago

battaglia01 commented 1 year ago

I've tried to get line profiling working on Cython 0.29.33 on several different computers to no avail, at least with line_profiler 4.0.2.

One of them is a new M1 running macOS Monterey, and the other is an older Intel mac running High Sierra, and both have the same problem.

The problem is summarized in this StackOverflow post: https://stackoverflow.com/questions/75420574/as-of-2023-is-there-any-way-to-line-profile-cython-at-all

And in fact here is a Jupyter notebook showing the problem: https://nbviewer.org/gist/battaglia01/f138f6b85235a530f7f62f5af5a002f0?flush_cache=true

This notebook apparently used to work, as seen in this post from 8 years ago: https://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line?noredirect=1&lq=1

I've tried just about everything there is to try: variations of setting profile=True, binding=True, linetrace=True, CYTHON_TRACE=1, CYTHON_TRACE_NOGIL=1, etc, to no avail. line_profile seems to just skip Cython functions entirely.

Where does one start?

Theelx commented 1 year ago

Having tried to profile Cython code many times, both on the line_profiler 3.x and 4.x series, I have never gotten it to work. I'm at a loss as to why, and whatever the cause is, it seems to be a fundamental design issue rather than a quick few-line-fix.

Erotemic commented 1 year ago

Does scalene work?

Theelx commented 1 year ago

Scalene does indeed work, but I've found it's usefulness limited due to the fact that it reports per-line timing as a percentage. Py-spy's native mode is much better imo, but that's implemented in a very hacky way. I might be able to borrow some tricks from scalene to get native code working though.

Theelx commented 1 year ago

https://github.com/plasma-umass/scalene/blob/5290b622a35718c418586c6e2ce3245fed802459/scalene/scalene_profiler.py#L1053 This seems to detail how scalene "times" native code. I'll see if it can be modified to support line-by-line granularity, but that might come at the cost of significantly increased overhead, which would necessitate making a separate timing code path for instances where the user wants to profile native functions.

Edit: Py-spy's approach may be more useful: https://www.benfrederickson.com/profiling-native-python-extensions-with-py-spy/

Erotemic commented 1 year ago

These are good resources. At the very least I should spend some time on the README to modernize the usage instructions, point to these alternatives, and lay out that the strengths and weaknesses of this library are.

It would be great if we can work towards profiling Cython code, but I would also be happy with simply stating that we don't support that use case.

battaglia01 commented 1 year ago

FYI, older versions of line_profiler do seem to work in Cython (3.3.1 in particular), but the overhead was so extreme that it was basically useless.

emlys commented 2 months ago

Would appreciate some official documentation on whether line_profiler should be used with cython at all. There are a couple of older stackoverflow threads that would lead folks to think it's possible (https://stackoverflow.com/questions/24144931/python-line-profiler-and-cython-function, https://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line/), and at one point I got it to work, though I can't reproduce it now. But some comments here (https://github.com/pyutils/line_profiler/issues/200#issuecomment-1429142808, https://github.com/pyutils/line_profiler/issues/200#issuecomment-1426805703) make me think I shouldn't trust any results I do get.

emlys commented 2 months ago

FYI, older versions of line_profiler do seem to work in Cython (3.3.1 in particular), but the overhead was so extreme that it was basically useless.

@battaglia01 do you happen to know if the overhead is proportional to the actual runtime? I.e. are the results still usable to infer the relative amount of time taken by each line, even if a lot of it is overhead?