Closed Erotemic closed 8 months ago
@Theelx Having a bit of trouble with this if you have some time to take a look. In 3.12 I'm getting an error when compiling:
Error compiling Cython file:
------------------------------------------------------------
...
def __exit__(self, exc_type, exc_val, exc_tb):
self.disable_by_count()
def enable(self):
PyEval_SetTrace(python_trace_callback, self)
^
------------------------------------------------------------
line_profiler/_line_profiler.pyx:311:24: Cannot assign type 'int (object, PyFrameObject *, int, PyObject *) except? -1' to 'Py_tracefunc'. Exception values are incompatible. Suggest adding 'noexcept' to type 'int (object, PyFrameObject *, int, PyObject *) except? -1'.
Which looks like it is due to the types in the function:
@cython.boundscheck(False)
@cython.wraparound(False)
cdef int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
PyObject *arg):
If I add noexcept
at the end of the declaration it works on 3.12, but fails on 3.11 (also some of the tests fail), i.e.
@cython.boundscheck(False)
@cython.wraparound(False)
cdef int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
PyObject *arg) noexcept:
On 3.11 if I use @cython.exceptval(check=False)
(which IIUC) should be equivalent according to https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#error-return-values
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.exceptval(check=False)
cdef int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
PyObject *arg):
But this seems to fail again on 3.12.
From what I understand, we need to define a function with the same signature as Py_tracefunc. I'm not fluent in C symantics, so I don't know if a naked function like that is implicity "no except" or "possible except", but I assume it's a noexcept because it's integer return code is indicating the error state. I'm not sure why this worked in previous versions, perhaps cython got more restrictive about function signatures matching exception types? I'm also wondering if there is a condition that could cause an exception in that function and then require us to return -1
(on a quick glance it looks like there are key errors that could happen).
on my phone atm but for the sake of a quick response, this is the same issue we got when going from cython 3.0a12 to b1
on my phone atm but for the sake of a quick response, this is the same issue we got when going from cython 3.0a12 to b1
So I'm looking into why at the moment, but #203 doesn't have these issues with 3.12. Hopefully I can isolate what's causing the errors with 3.12 from that fork.
@Erotemic I believe I just found the issue. Add "legacy_implicit_noexcept": True,
to the compiler_directives in setup.py, line 231. After this, you can remove the exceptval=False from the callback in the .pyx and you can also remove the other directives that were added at the top of the .pyx file.
Keeping the directives at the top of the pyx file lets you use the standalone cythonize tool instead of needing to invoke the entire setup.py, which I think is nice for developing.
It probably also makes sense to indent to remove the legacy behavior and explicitly mark noexcept functions as such, so I' like to keep the directive as well unless it causes issues.
The fix does allow me to compile on 3.12, but I'm still getting test failures I'll need to look into. For reference the failing tests are:
FAILED tests/test_explicit_profile.py::test_explicit_profile_with_in_code_enable - AssertionError: assert 'Function: func2' in 'Timer unit: 1e-09 s\n\n 0.00 seconds - /tmp/tmpk326uqo9/script.py:9 - func2\n 0.00 seconds - /tmp/tmpk326uqo9/script.py:21 - fun...
FAILED tests/test_explicit_profile.py::test_explicit_profile_with_duplicate_functions - AssertionError: assert 'Function: func1' in 'Timer unit: 1e-09 s\n\n 0.00 seconds - /tmp/tmpij5_1ajc/script.py:3 - func1\n 0.00 seconds - /tmp/tmpij5_1ajc/scrip...unc2\n 0....
FAILED tests/test_ipython.py::TestIPython::test_init - AssertionError: 0 != 1
@Erotemic For getting rid of the Cython deprecation warnings which happen with the legacy implicit noexcept option, you can try putting extern
after cdef in the function definition, and taking out the decorator:
@cython.boundscheck(False)
@cython.wraparound(False)
cdef extern int python_trace_callback(object self_, PyFrameObject *py_frame, int what,
PyObject *arg):
It works for me on 3.12, and my guess is it'll work on 3.11 though I haven't tried it yet.
EDIT: This does not seem to fix the py.test tests, but I'm unsure why. It works fine for some personal scripts that I commonly use it to test for potential perf improvements.
Using extern and no-decorator seem to compile just as well.
I think the test failures must have to do with something in 3.12 itself, and not any Cython changes. Most other tests run fine. It's two tests for the new explicit profiler, and 1 test for the IPython magic that are the problem.
Running pytest -k with_in_code_enable -s
on 3.11 gives:
tests/test_explicit_profile.py Timer unit: 1e-09 s
0.00 seconds - /tmp/tmp9sqjysht/script.py:21 - func4
0.00 seconds - /tmp/tmp9sqjysht/script.py:9 - func2
Wrote profile results to custom_output.txt
Wrote profile results to custom_output_2023-10-16T110436.txt
Wrote profile results to custom_output.lprof
To view details run:
python -m line_profiler -rtmz custom_output.lprof
Finished running script
Contents of /tmp/tmpitjiehix/custom_output.txt
Timer unit: 1e-09 s
Total time: 1.47e-07 s
File: /tmp/tmp9sqjysht/script.py
Function: func4 at line 21
Line # Hits Time Per Hit % Time Line Contents
==============================================================
21 @profile
22 def func4(a):
23 1 147.0 147.0 100.0 return a + 1
And on 3.12 it fails:
tests/test_explicit_profile.py Timer unit: 1e-09 s
0.00 seconds - /tmp/tmpjo8nqdo6/script.py:9 - func2
0.00 seconds - /tmp/tmpjo8nqdo6/script.py:21 - func4
Wrote profile results to custom_output.txt
Wrote profile results to custom_output_2023-10-16T110444.txt
Wrote profile results to custom_output.lprof
To view details run:
python -m line_profiler -rtmz custom_output.lprof
Finished running script
Contents of /tmp/tmpxxdv1a41/custom_output.txt
Timer unit: 1e-09 s
0.00 seconds - /tmp/tmpjo8nqdo6/script.py:9 - func2
0.00 seconds - /tmp/tmpjo8nqdo6/script.py:21 - func4
F
For the IPython test running pytest -k test_init -s -v
the difference is in 3.11:
tests/test_ipython.py::TestIPython::test_init Timer unit: 1e-09 s
Total time: 3.51e-07 s
File: <ipython-input-1-ae24564ca1b8>
Function: func at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def func():
2 1 351.0 351.0 100.0 return 2**20
func_data=('<ipython-input-1-ae24564ca1b8>', 1, 'func')
lines_data=[(2, 1, 351)]
In 3.12:
tests/test_ipython.py::TestIPython::test_init Timer unit: 1e-09 s
Total time: 0 s
File: <ipython-input-1-ae24564ca1b8>
Function: func at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def func():
2 return 2**20
func_data=('<ipython-input-1-ae24564ca1b8>', 1, 'func')
lines_data=[]
Using extern and no-decorator seem to compile just as well.
That's true, but the extern method is not deprecated, whereas the legacy compiler directive is, and iirc the decorator fails in pre-3.0 versions of Cython. Regardless, I agree that the test failures have something to do with 3.12,so I'll look into that.
@Erotemic I wonder if maybe the testing harness is the issue? I just ran with_code_enable in its own file and it works fine.
While pytest has caused issues for me in the past, I don't think it is the culprit here:
Running outside of pytest via:
python -m unittest tests/test_ipython.py
xdoctest tests/test_explicit_profile.py test_explicit_profile_with_in_code_enable
xdoctest tests/test_explicit_profile.py test_explicit_profile_with_duplicate_functions
still results in the same errors.
I also started up an IPython shell and ran the tests directly, and they also failed.
@Theelx is there anything in this minimal working example that looks odd to you:
def test_simple_explicit_nonglobal_usage():
from line_profiler import LineProfiler
import ubelt as ub
profiler = LineProfiler()
def func(a):
return a + 1
profiled_func = profiler(func)
# Run Zero Times
lstats = profiler.get_stats()
print('lstats.timings = {}'.format(ub.urepr(lstats.timings, nl=1)))
print('lstats.unit = {}'.format(ub.urepr(lstats.unit, nl=1)))
print(f'profiler.code_hash_map={profiler.code_hash_map}')
profiler.print_stats()
# Run Once
profiled_func(1)
lstats = profiler.get_stats()
print('lstats.timings = {}'.format(ub.urepr(lstats.timings, nl=1)))
print('lstats.unit = {}'.format(ub.urepr(lstats.unit, nl=1)))
print(f'profiler.code_hash_map={profiler.code_hash_map}')
profiler.print_stats()
# Run Multiple Times
for _ in range(10):
profiled_func(1)
lstats = profiler.get_stats()
print('lstats.timings = {}'.format(ub.urepr(lstats.timings, nl=1)))
print('lstats.unit = {}'.format(ub.urepr(lstats.unit, nl=1)))
print(f'profiler.code_hash_map={profiler.code_hash_map}')
profiler.print_stats()
test_simple_explicit_nonglobal_usage()
In 3.11 it reports time correctly, but in 3.12 it is like the profiled function was never called. I'm not sure how anything can work if this simple case is failing.
On 3.12:
lstats.timings = {
('/home/joncrall/code/line_profiler/tests/test_explicit_profile.py', 13, 'func'): [],
}
lstats.unit = 1e-09
profiler.code_hash_map={<code object func at 0x7f002870e590, file "/home/joncrall/code/line_profiler/tests/test_explicit_profile.py", line 13>: [1917141407524353302, 1917141407524353301]}
Timer unit: 1e-09 s
Total time: 0 s
File: /home/joncrall/code/line_profiler/tests/test_explicit_profile.py
Function: func at line 13
Line # Hits Time Per Hit % Time Line Contents
==============================================================
13 def func(a):
14 return a + 1
On 3.11:
lstats.timings = {
('/home/joncrall/code/line_profiler/tests/test_explicit_profile.py', 13, 'func'): [(14, 11, 2856)],
}
lstats.unit = 1e-09
profiler.code_hash_map={<code object func at 0x7f896e352a70, file "/home/joncrall/code/line_profiler/tests/test_explicit_profile.py", line 13>: [-4000377494778487241, -4000377494778487244]}
Timer unit: 1e-09 s
Total time: 2.856e-06 s
File: /home/joncrall/code/line_profiler/tests/test_explicit_profile.py
Function: func at line 13
Line # Hits Time Per Hit % Time Line Contents
==============================================================
13 def func(a):
14 11 2856.0 259.6 100.0 return a + 1
One thing that looks odd is that 3.11 shows the function as taking ~260us per hit, where it should take ~1us. I assume this is ubelt overhead, so my question is can this be replicated without using ubelt?
Yes, ubelt is not necessary. It's also not the test harness. Here is a shell script that reproduces the example explicitly:
echo "
def test_simple_explicit_nonglobal_usage():
from line_profiler import LineProfiler
profiler = LineProfiler()
def func(a):
return a + 1
profiled_func = profiler(func)
# Run Zero Times
lstats = profiler.get_stats()
print(f'lstats.timings={lstats.timings}')
print(f'lstats.unit={lstats.unit}')
print(f'profiler.code_hash_map={profiler.code_hash_map}')
profiler.print_stats()
# Run Once
profiled_func(1)
lstats = profiler.get_stats()
print(f'lstats.timings={lstats.timings}')
print(f'lstats.unit={lstats.unit}')
print(f'profiler.code_hash_map={profiler.code_hash_map}')
profiler.print_stats()
# Run Multiple Times
for _ in range(10):
profiled_func(1)
lstats = profiler.get_stats()
print(f'lstats.timings={lstats.timings}')
print(f'lstats.unit={lstats.unit}')
print(f'profiler.code_hash_map={profiler.code_hash_map}')
profiler.print_stats()
test_simple_explicit_nonglobal_usage()
" > mwe.py
python mwe.py
In 3.12 the last print block is:
lstats.timings={('/home/joncrall/tmp/mwe.py', 6, 'func'): []}
lstats.unit=1e-09
profiler.code_hash_map={<code object func at 0x7f76c730e320, file "/home/joncrall/tmp/mwe.py", line 6>: [4641089316629750693, 4641089316629750692]}
Timer unit: 1e-09 s
Total time: 0 s
File: /home/joncrall/tmp/mwe.py
Function: func at line 6
Line # Hits Time Per Hit % Time Line Contents
==============================================================
6 def func(a):
7 return a + 1
In 3.11 the last print block is:
lstats.timings={('/home/joncrall/tmp/mwe.py', 6, 'func'): [(7, 11, 2328)]}
lstats.unit=1e-09
profiler.code_hash_map={<code object func at 0x7fb0961f56f0, file "/home/joncrall/tmp/mwe.py", line 6>: [6762933698279728574, 6762933698279728575]}
Timer unit: 1e-09 s
Total time: 2.328e-06 s
File: /home/joncrall/tmp/mwe.py
Function: func at line 6
Line # Hits Time Per Hit % Time Line Contents
==============================================================
6 def func(a):
7 11 2328.0 211.6 100.0 return a + 1
The time seems to be about the same (although if I run for many more iterations, it goes to about 105), but isn't the unit in nanoseconds not microseconds?. I find myself easilly confused when dealing with units, so I typically like to write it out explicitly in pint, and then check that all conversions make sense.
import pint
import ubelt as ub
ureg = pint.UnitRegistry()
s = ureg.seconds
ns = ureg.parse_expression('1 nanosecond')
μs = ureg.parse_expression('1 microsecond')
quantities = {}
quantities['prog_total'] = 2.3e-6 * ureg.seconds
quantities['line_total'] = 2328 * ns
quantities['perhit'] = 211 * ns
nhits = 11
quantities['line_total_derived'] = quantities['perhit'] * nhits
quantities = {k: v.to(μs) for k, v in quantities.items()}
print('quantities = {}'.format(ub.urepr(quantities, nl=1, align=':')))
This prints
quantities = {
'prog_total' : <Quantity(2.3, 'microsecond')>,
'line_total' : <Quantity(2.328, 'microsecond')>,
'perhit' : <Quantity(0.211, 'microsecond')>,
'line_total_derived': <Quantity(2.321, 'microsecond')>,
}
which seems consistent to me when the numbers are interpreted as nanoseconds (which is 1e-9 as the timer unit reports).
Maybe related: on 3.12.0 I see PyCode_Addr2Line
and PyFrame_GetLineNumber
returning -1 for some offsets/frames (lasti). Couldn't find any official information if this is intended.
edit: only in async functions, so doesn't explain the mwe which I'm unable to reproduce. The example creating -1:
which seems consistent to me when the numbers are interpreted as nanoseconds (which is 1e-9 as the timer unit reports).
Ah yeah that's true. I forgot that Linux/BSD platforms use units of 1e-8 seconds instead of 1e-6 which used to be the default for all platforms.
I am genuinely confused as to how this is happening, as it works fine for me when I install it and run it locally in the terminal on Ubuntu 22.04, with Python 3.12 behind pyenv.
@notEvil You are unable to reproduce the MWE in this post? So when you run that in 3.12 you get times? Can you paste your output? What environment are you running in?
In my execution it's a fresh pyenv 3.12.0 build on Ubuntu 22.04.
Just tried again and I'm able to reproduce now. Replacing .f_lineno
with PyFrame_GetLineNumber
seems to fix it. I probably tested with those changes and paid no attention.
Merging #246 (dd29898) into main (075b5c4) will not change coverage. The diff coverage is
100.00%
.
@notEvil looks like that fixed it!
I was wondering why this is the case and I found this: https://bugs.python.org/issue42823#msg384502 where Mark Shannon states:
It looks like you are accessing the c field "f_lineno" directly. That was never guaranteed to work, it just happened to. You should use PyFrame_GetLineNumber()
So using the official C-API to get the line number is probably the correct thing to be doing.
Updating CI to test Python 3.12