pyutils / line_profiler

Line-by-line profiling for Python
Other
2.56k stars 118 forks source link

Support 3.12 #246

Closed Erotemic closed 8 months ago

Erotemic commented 9 months ago

Updating CI to test Python 3.12

Erotemic commented 8 months ago

@Theelx Having a bit of trouble with this if you have some time to take a look. In 3.12 I'm getting an error when compiling:

      Error compiling Cython file:
      ------------------------------------------------------------
      ...

          def __exit__(self, exc_type, exc_val, exc_tb):
              self.disable_by_count()

          def enable(self):
              PyEval_SetTrace(python_trace_callback, self)
                              ^
      ------------------------------------------------------------

      line_profiler/_line_profiler.pyx:311:24: Cannot assign type 'int (object, PyFrameObject *, int, PyObject *) except? -1' to 'Py_tracefunc'. Exception values are incompatible. Suggest adding 'noexcept' to type 'int (object, PyFrameObject *, int, PyObject *) except? -1'.

Which looks like it is due to the types in the function:

@cython.boundscheck(False)
@cython.wraparound(False)
cdef int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
                               PyObject *arg):

If I add noexcept at the end of the declaration it works on 3.12, but fails on 3.11 (also some of the tests fail), i.e.

@cython.boundscheck(False)
@cython.wraparound(False)
cdef int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
                               PyObject *arg) noexcept:

On 3.11 if I use @cython.exceptval(check=False) (which IIUC) should be equivalent according to https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#error-return-values

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.exceptval(check=False)
cdef int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
                               PyObject *arg):

But this seems to fail again on 3.12.

From what I understand, we need to define a function with the same signature as Py_tracefunc. I'm not fluent in C symantics, so I don't know if a naked function like that is implicity "no except" or "possible except", but I assume it's a noexcept because it's integer return code is indicating the error state. I'm not sure why this worked in previous versions, perhaps cython got more restrictive about function signatures matching exception types? I'm also wondering if there is a condition that could cause an exception in that function and then require us to return -1 (on a quick glance it looks like there are key errors that could happen).

Theelx commented 8 months ago

on my phone atm but for the sake of a quick response, this is the same issue we got when going from cython 3.0a12 to b1

Theelx commented 8 months ago

on my phone atm but for the sake of a quick response, this is the same issue we got when going from cython 3.0a12 to b1

Theelx commented 8 months ago

So I'm looking into why at the moment, but #203 doesn't have these issues with 3.12. Hopefully I can isolate what's causing the errors with 3.12 from that fork.

Theelx commented 8 months ago

@Erotemic I believe I just found the issue. Add "legacy_implicit_noexcept": True, to the compiler_directives in setup.py, line 231. After this, you can remove the exceptval=False from the callback in the .pyx and you can also remove the other directives that were added at the top of the .pyx file.

Erotemic commented 8 months ago

Keeping the directives at the top of the pyx file lets you use the standalone cythonize tool instead of needing to invoke the entire setup.py, which I think is nice for developing.

It probably also makes sense to indent to remove the legacy behavior and explicitly mark noexcept functions as such, so I' like to keep the directive as well unless it causes issues.

The fix does allow me to compile on 3.12, but I'm still getting test failures I'll need to look into. For reference the failing tests are:

FAILED tests/test_explicit_profile.py::test_explicit_profile_with_in_code_enable - AssertionError: assert 'Function: func2' in 'Timer unit: 1e-09 s\n\n  0.00 seconds - /tmp/tmpk326uqo9/script.py:9 - func2\n  0.00 seconds - /tmp/tmpk326uqo9/script.py:21 - fun...
FAILED tests/test_explicit_profile.py::test_explicit_profile_with_duplicate_functions - AssertionError: assert 'Function: func1' in 'Timer unit: 1e-09 s\n\n  0.00 seconds - /tmp/tmpij5_1ajc/script.py:3 - func1\n  0.00 seconds - /tmp/tmpij5_1ajc/scrip...unc2\n  0....
FAILED tests/test_ipython.py::TestIPython::test_init - AssertionError: 0 != 1
Theelx commented 8 months ago

@Erotemic For getting rid of the Cython deprecation warnings which happen with the legacy implicit noexcept option, you can try putting extern after cdef in the function definition, and taking out the decorator:

@cython.boundscheck(False)
@cython.wraparound(False)
cdef extern int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
                               PyObject *arg):

It works for me on 3.12, and my guess is it'll work on 3.11 though I haven't tried it yet.

EDIT: This does not seem to fix the py.test tests, but I'm unsure why. It works fine for some personal scripts that I commonly use it to test for potential perf improvements.

Erotemic commented 8 months ago

Using extern and no-decorator seem to compile just as well.

I think the test failures must have to do with something in 3.12 itself, and not any Cython changes. Most other tests run fine. It's two tests for the new explicit profiler, and 1 test for the IPython magic that are the problem.

Running pytest -k with_in_code_enable -s on 3.11 gives:

tests/test_explicit_profile.py Timer unit: 1e-09 s

  0.00 seconds - /tmp/tmp9sqjysht/script.py:21 - func4
  0.00 seconds - /tmp/tmp9sqjysht/script.py:9 - func2
Wrote profile results to custom_output.txt
Wrote profile results to custom_output_2023-10-16T110436.txt
Wrote profile results to custom_output.lprof
To view details run:
python -m line_profiler -rtmz custom_output.lprof

Finished running script
Contents of /tmp/tmpitjiehix/custom_output.txt
Timer unit: 1e-09 s

Total time: 1.47e-07 s
File: /tmp/tmp9sqjysht/script.py
Function: func4 at line 21

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    21                                           @profile
    22                                           def func4(a):
    23         1        147.0    147.0    100.0      return a + 1

And on 3.12 it fails:

tests/test_explicit_profile.py Timer unit: 1e-09 s

  0.00 seconds - /tmp/tmpjo8nqdo6/script.py:9 - func2
  0.00 seconds - /tmp/tmpjo8nqdo6/script.py:21 - func4
Wrote profile results to custom_output.txt
Wrote profile results to custom_output_2023-10-16T110444.txt
Wrote profile results to custom_output.lprof
To view details run:
python -m line_profiler -rtmz custom_output.lprof

Finished running script
Contents of /tmp/tmpxxdv1a41/custom_output.txt
Timer unit: 1e-09 s

  0.00 seconds - /tmp/tmpjo8nqdo6/script.py:9 - func2
  0.00 seconds - /tmp/tmpjo8nqdo6/script.py:21 - func4

F

For the IPython test running pytest -k test_init -s -v the difference is in 3.11:

tests/test_ipython.py::TestIPython::test_init Timer unit: 1e-09 s

Total time: 3.51e-07 s
File: <ipython-input-1-ae24564ca1b8>
Function: func at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def func():
     2         1        351.0    351.0    100.0      return 2**20
func_data=('<ipython-input-1-ae24564ca1b8>', 1, 'func')
lines_data=[(2, 1, 351)]

In 3.12:

tests/test_ipython.py::TestIPython::test_init Timer unit: 1e-09 s

Total time: 0 s
File: <ipython-input-1-ae24564ca1b8>
Function: func at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def func():
     2                                               return 2**20
func_data=('<ipython-input-1-ae24564ca1b8>', 1, 'func')
lines_data=[]
Theelx commented 8 months ago

Using extern and no-decorator seem to compile just as well.

That's true, but the extern method is not deprecated, whereas the legacy compiler directive is, and iirc the decorator fails in pre-3.0 versions of Cython. Regardless, I agree that the test failures have something to do with 3.12,so I'll look into that.

Theelx commented 8 months ago

@Erotemic I wonder if maybe the testing harness is the issue? I just ran with_code_enable in its own file and it works fine. Screenshot from 2023-10-17 13-21-11

Erotemic commented 8 months ago

While pytest has caused issues for me in the past, I don't think it is the culprit here:

Running outside of pytest via:

python -m unittest tests/test_ipython.py
xdoctest tests/test_explicit_profile.py test_explicit_profile_with_in_code_enable
xdoctest tests/test_explicit_profile.py test_explicit_profile_with_duplicate_functions 

still results in the same errors.

I also started up an IPython shell and ran the tests directly, and they also failed.

Erotemic commented 8 months ago

@Theelx is there anything in this minimal working example that looks odd to you:

def test_simple_explicit_nonglobal_usage():
    from line_profiler import LineProfiler
    import ubelt as ub
    profiler = LineProfiler()

    def func(a):
        return a + 1

    profiled_func = profiler(func)

    # Run Zero Times
    lstats = profiler.get_stats()
    print('lstats.timings = {}'.format(ub.urepr(lstats.timings, nl=1)))
    print('lstats.unit = {}'.format(ub.urepr(lstats.unit, nl=1)))
    print(f'profiler.code_hash_map={profiler.code_hash_map}')
    profiler.print_stats()

    # Run Once
    profiled_func(1)

    lstats = profiler.get_stats()
    print('lstats.timings = {}'.format(ub.urepr(lstats.timings, nl=1)))
    print('lstats.unit = {}'.format(ub.urepr(lstats.unit, nl=1)))
    print(f'profiler.code_hash_map={profiler.code_hash_map}')
    profiler.print_stats()

    # Run Multiple Times
    for _ in range(10):
        profiled_func(1)

    lstats = profiler.get_stats()
    print('lstats.timings = {}'.format(ub.urepr(lstats.timings, nl=1)))
    print('lstats.unit = {}'.format(ub.urepr(lstats.unit, nl=1)))
    print(f'profiler.code_hash_map={profiler.code_hash_map}')
    profiler.print_stats()

test_simple_explicit_nonglobal_usage()

In 3.11 it reports time correctly, but in 3.12 it is like the profiled function was never called. I'm not sure how anything can work if this simple case is failing.

On 3.12:

lstats.timings = {
    ('/home/joncrall/code/line_profiler/tests/test_explicit_profile.py', 13, 'func'): [],
}
lstats.unit = 1e-09
profiler.code_hash_map={<code object func at 0x7f002870e590, file "/home/joncrall/code/line_profiler/tests/test_explicit_profile.py", line 13>: [1917141407524353302, 1917141407524353301]}
Timer unit: 1e-09 s

Total time: 0 s
File: /home/joncrall/code/line_profiler/tests/test_explicit_profile.py
Function: func at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                               def func(a):
    14                                                   return a + 1

On 3.11:

lstats.timings = {
    ('/home/joncrall/code/line_profiler/tests/test_explicit_profile.py', 13, 'func'): [(14, 11, 2856)],
}
lstats.unit = 1e-09
profiler.code_hash_map={<code object func at 0x7f896e352a70, file "/home/joncrall/code/line_profiler/tests/test_explicit_profile.py", line 13>: [-4000377494778487241, -4000377494778487244]}
Timer unit: 1e-09 s

Total time: 2.856e-06 s
File: /home/joncrall/code/line_profiler/tests/test_explicit_profile.py
Function: func at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                               def func(a):
    14        11       2856.0    259.6    100.0          return a + 1
Theelx commented 8 months ago

One thing that looks odd is that 3.11 shows the function as taking ~260us per hit, where it should take ~1us. I assume this is ubelt overhead, so my question is can this be replicated without using ubelt?

Erotemic commented 8 months ago

Yes, ubelt is not necessary. It's also not the test harness. Here is a shell script that reproduces the example explicitly:

echo "
def test_simple_explicit_nonglobal_usage():
    from line_profiler import LineProfiler
    profiler = LineProfiler()

    def func(a):
        return a + 1

    profiled_func = profiler(func)

    # Run Zero Times
    lstats = profiler.get_stats()
    print(f'lstats.timings={lstats.timings}')
    print(f'lstats.unit={lstats.unit}')
    print(f'profiler.code_hash_map={profiler.code_hash_map}')
    profiler.print_stats()

    # Run Once
    profiled_func(1)

    lstats = profiler.get_stats()
    print(f'lstats.timings={lstats.timings}')
    print(f'lstats.unit={lstats.unit}')
    print(f'profiler.code_hash_map={profiler.code_hash_map}')
    profiler.print_stats()

    # Run Multiple Times
    for _ in range(10):
        profiled_func(1)

    lstats = profiler.get_stats()
    print(f'lstats.timings={lstats.timings}')
    print(f'lstats.unit={lstats.unit}')
    print(f'profiler.code_hash_map={profiler.code_hash_map}')
    profiler.print_stats()

test_simple_explicit_nonglobal_usage()
" > mwe.py

python mwe.py

In 3.12 the last print block is:

lstats.timings={('/home/joncrall/tmp/mwe.py', 6, 'func'): []}
lstats.unit=1e-09
profiler.code_hash_map={<code object func at 0x7f76c730e320, file "/home/joncrall/tmp/mwe.py", line 6>: [4641089316629750693, 4641089316629750692]}
Timer unit: 1e-09 s

Total time: 0 s
File: /home/joncrall/tmp/mwe.py
Function: func at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     6                                               def func(a):
     7                                                   return a + 1

In 3.11 the last print block is:

lstats.timings={('/home/joncrall/tmp/mwe.py', 6, 'func'): [(7, 11, 2328)]}
lstats.unit=1e-09
profiler.code_hash_map={<code object func at 0x7fb0961f56f0, file "/home/joncrall/tmp/mwe.py", line 6>: [6762933698279728574, 6762933698279728575]}
Timer unit: 1e-09 s

Total time: 2.328e-06 s
File: /home/joncrall/tmp/mwe.py
Function: func at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     6                                               def func(a):
     7        11       2328.0    211.6    100.0          return a + 1

The time seems to be about the same (although if I run for many more iterations, it goes to about 105), but isn't the unit in nanoseconds not microseconds?. I find myself easilly confused when dealing with units, so I typically like to write it out explicitly in pint, and then check that all conversions make sense.

import pint
import ubelt as ub
ureg = pint.UnitRegistry()

s = ureg.seconds
ns = ureg.parse_expression('1 nanosecond')
μs = ureg.parse_expression('1 microsecond')

quantities = {}
quantities['prog_total'] = 2.3e-6 * ureg.seconds
quantities['line_total'] = 2328 * ns
quantities['perhit'] = 211 * ns

nhits = 11
quantities['line_total_derived'] = quantities['perhit'] * nhits

quantities = {k: v.to(μs) for k, v in quantities.items()}

print('quantities = {}'.format(ub.urepr(quantities, nl=1, align=':')))

This prints

quantities = {
    'prog_total'        : <Quantity(2.3, 'microsecond')>,
    'line_total'        : <Quantity(2.328, 'microsecond')>,
    'perhit'            : <Quantity(0.211, 'microsecond')>,
    'line_total_derived': <Quantity(2.321, 'microsecond')>,
}

which seems consistent to me when the numbers are interpreted as nanoseconds (which is 1e-9 as the timer unit reports).

notEvil commented 8 months ago

Maybe related: on 3.12.0 I see PyCode_Addr2Line and PyFrame_GetLineNumber returning -1 for some offsets/frames (lasti). Couldn't find any official information if this is intended.

edit: only in async functions, so doesn't explain the mwe which I'm unable to reproduce. The example creating -1:

```python import asyncio @profile async def function_1(depth): if (0 < depth): ([_ async for _ in function_2(depth - 1)]) async def function_2(depth): if (0 < depth): yield () if (False): yield async def async_main(): async with asyncio.TaskGroup() as task_group: task_group.create_task(function_1(2)) asyncio.run(async_main()) ```
Theelx commented 8 months ago

which seems consistent to me when the numbers are interpreted as nanoseconds (which is 1e-9 as the timer unit reports).

Ah yeah that's true. I forgot that Linux/BSD platforms use units of 1e-8 seconds instead of 1e-6 which used to be the default for all platforms.

I am genuinely confused as to how this is happening, as it works fine for me when I install it and run it locally in the terminal on Ubuntu 22.04, with Python 3.12 behind pyenv.

Erotemic commented 8 months ago

@notEvil You are unable to reproduce the MWE in this post? So when you run that in 3.12 you get times? Can you paste your output? What environment are you running in?

In my execution it's a fresh pyenv 3.12.0 build on Ubuntu 22.04.

notEvil commented 8 months ago

Just tried again and I'm able to reproduce now. Replacing .f_lineno with PyFrame_GetLineNumber seems to fix it. I probably tested with those changes and paid no attention.

```diff diff --git a/line_profiler/_line_profiler.pyx b/line_profiler/_line_profiler.pyx index f11b6b5..bfd5fbb 100644 --- a/line_profiler/_line_profiler.pyx +++ b/line_profiler/_line_profiler.pyx @@ -89,6 +89,8 @@ cdef extern from "Python.h": cdef int PyTrace_C_EXCEPTION cdef int PyTrace_C_RETURN + cdef int PyFrame_GetLineNumber(PyFrameObject *frame) + cdef extern from "timers.c": PY_LONG_LONG hpTimer() double hpTimerUnit() @@ -425,7 +427,7 @@ cdef extern int python_trace_callback(object self_, PyFrameObject *py_frame, if what == PyTrace_LINE or what == PyTrace_RETURN: # Normally we'd need to DECREF the return from get_frame_code, but Cython does that for us block_hash = hash(get_frame_code(py_frame)) - code_hash = compute_line_hash(block_hash, py_frame.f_lineno) + code_hash = compute_line_hash(block_hash, PyFrame_GetLineNumber(py_frame)) if self._c_code_map.count(code_hash): time = hpTimer() ident = threading.get_ident() @@ -440,7 +442,7 @@ cdef extern int python_trace_callback(object self_, PyFrameObject *py_frame, if what == PyTrace_LINE: # Get the time again. This way, we don't record much time wasted # in this function. - self._c_last_time[ident][block_hash] = LastTime(py_frame.f_lineno, hpTimer()) + self._c_last_time[ident][block_hash] = LastTime(PyFrame_GetLineNumber(py_frame), hpTimer()) elif self._c_last_time[ident].count(block_hash): # We are returning from a function, not executing a line. Delete # the last_time record. It may have already been deleted if we ```
codecov[bot] commented 8 months ago

Codecov Report

Merging #246 (dd29898) into main (075b5c4) will not change coverage. The diff coverage is 100.00%.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/pyutils/line_profiler/pull/246/graphs/tree.svg?width=650&height=150&src=pr&token=xIK8nFU3K5&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils)](https://app.codecov.io/gh/pyutils/line_profiler/pull/246?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils) ```diff @@ Coverage Diff @@ ## main #246 +/- ## ======================================= Coverage 53.36% 53.36% ======================================= Files 11 11 Lines 832 832 Branches 168 168 ======================================= Hits 444 444 Misses 329 329 Partials 59 59 ``` | [Files](https://app.codecov.io/gh/pyutils/line_profiler/pull/246?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils) | Coverage Δ | | |---|---|---| | [line\_profiler/\_\_init\_\_.py](https://app.codecov.io/gh/pyutils/line_profiler/pull/246?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils#diff-bGluZV9wcm9maWxlci9fX2luaXRfXy5weQ==) | `100.00% <100.00%> (ø)` | | | [line\_profiler/line\_profiler.py](https://app.codecov.io/gh/pyutils/line_profiler/pull/246?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils#diff-bGluZV9wcm9maWxlci9saW5lX3Byb2ZpbGVyLnB5) | `50.00% <100.00%> (ø)` | | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/pyutils/line_profiler/pull/246/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils) ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/pyutils/line_profiler/pull/246?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/pyutils/line_profiler/pull/246?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils). Last update [075b5c4...dd29898](https://app.codecov.io/gh/pyutils/line_profiler/pull/246?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pyutils).
Erotemic commented 8 months ago

@notEvil looks like that fixed it!

I was wondering why this is the case and I found this: https://bugs.python.org/issue42823#msg384502 where Mark Shannon states:

It looks like you are accessing the c field "f_lineno" directly. That was never guaranteed to work, it just happened to. You should use PyFrame_GetLineNumber()

So using the official C-API to get the line number is probably the correct thing to be doing.