Open nedbat opened 11 months ago
This issue can be seen starting from the commit 411b1692811b2ecac59cb0df0f920861c7cf179a with usual numpy-1.25.2.
cc. @markshannon
@markshannon Is there any news about this?
@iritkatriel perhaps you have some idea?
@iritkatriel perhaps you have some idea?
Hi @nedbat , I circled back to this issue. If you are still interested, could you try the patch of #114986 and see if the result is improved?
@gaogaotiantian Thanks. I tried #114986, and it didn't run faster. I used the https://github.com/Quansight-Labs/ndindex repo test suite: 3.11: 160s 3.12: 255s 3.13 main: 248s pr114986: 245s
I wish I had better news.
Oh, did you test it with an empty trace function?
I will do more tests, but the ultimate goal is to perform better in real-world scenarios. Here is more data from a subset of the test suite (-k test_chunking
), which looks much more promising. It's hard to get consistent numbers, but this looks good. I would merge this change.
Python version | no tracing | Python trace | C trace |
---|---|---|---|
3.11.8 | 1.19s | 7.39s | 2.46s |
3.12.2 | 1.12s | 11.57s | 3.71s |
3.13.0a3+ heads/main:a95b1a56bb | 1.14s | 10.37s | 3.89s |
3.13.0a3+ heads/pr/114986:cb09c55758 | 1.14s | 6.48s | 2.91s |
Thank you for the test. Yes the PR only improves the performance of the actual trace function - which is normally the major cost for tracing tools. Good to know it helps with the more real-life benchmark. Unfortunately, due to the change of the tracing mechanism, the overhead for sys.settrace
will always be larger than the old way - I did not see a quick solution to that.
Now that the eval_breaker is part of the thread state, here's an alternative approach that might work.
tstate->tracing
only needs one bit, so use the low bit in the version part of the eval breaker.
For non-tracing functions we expect to always see the low bit set to 0.
For tracing functions we expect to always see the low bit set to 1.
If the code and global versions differ do the following:
I think this is correct, but I've not drawn up a full state diagram.
./python -m test -R 3:3 test_frame
) to fail.If not fixed in 24 hours, I'll revert the commit to unblock buildbots. (I hope I'll get have time to investigate & come up with a proper fix, but can't promise that.)
@encukou See response in #116098
Hi.
I wanted to post this in https://github.com/nedbat/coveragepy/issues/1665 but it seems that investigation moved here. I hope my code example will help to investigate and not clutter the Issue.
I also have been chasing "tests run 2 times slower" dragon. I used coverage
through pytest
and beside some test cases running slower in general, I've had a consistent +43 seconds "freeze" in collecting phase for specific import. I will paste some log excerpts and then minimal code to reproduce some of them. Look at the log times in brackets to find problematic steps.
pytest_debug.txt
- this comes from example code
[00:00:00.265179] found cached rewritten pyc for /path/test_jieba_2/test/test_fut.py [assertion]
[00:00:00.265185] early skip of rewriting module: fut [assertion]
[00:00:00.265190] early skip of rewriting module: jieba [assertion]
[00:00:00.265195] early skip of rewriting module: jieba.finalseg [assertion]
[00:00:00.265200] early skip of rewriting module: jieba._compat [assertion]
[00:00:44.966856] ============================= test session starts ==============================
[00:00:44.966888] platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0 -- /venv/path/bin/python
[00:00:44.966893] using: pytest-8.1.1
[00:00:44.966896] setuptools registered plugins:
[00:00:44.966899] pytest-cov-4.1.0 at /venv/path/lib/python3.12/site-packages/pytest_cov/plugin.py
[00:00:44.966902] rootdir: /path/test_jieba_2
[00:00:44.966905] plugins: cov-4.1.0
[00:00:44.966908] collected 1 item
[00:00:44.966910]
[00:00:44.978316] test/test_fut.py . [100%] early skip of rewriting module: pkg_resources [assertion]
[00:00:44.978354] early skip of rewriting module: jieba.finalseg.prob_start [assertion]
[00:00:44.978360] early skip of rewriting module: jieba.finalseg.prob_trans [assertion]
[00:00:44.978363] early skip of rewriting module: jieba.finalseg.prob_emit [assertion]
[00:00:44.978366] pytest_pycollect_makeitem [hook]
[00:00:44.978369] collector: <Module test_fut.py>
[00:00:44.978372] name: @py_builtins
[00:00:44.978375] obj: <module 'builtins' (built-in)>
[00:00:44.978378] finish pytest_pycollect_makeitem --> None [hook]
early skip of rewriting module: jieba...
lines overlap with test session starts
in a weird way in this minimal example but in full project I'm working on it froze at early skip of rewriting module: jieba.finalseg.prob_emit [assertion]
consistently.
This one is from the project. Let's call it python_debug.txt
(command to make it is at the end). I'm pasting that because in minimal example it freezes in different place but actually jieba
import is the culprit, specificly jieba_pyfast.finalseg.prob_emit
module.
[00:00:09.073173] # possible namespace for /venv/path/lib/python3.12/site-packages/google
[00:00:09.073279] # possible namespace for /venv/path/lib/python3.12/site-packages/google
[00:00:09.099473] import 'pkg_resources' # <_frozen_importlib_external.SourceFileLoader object at 0x7fa3cf936d20>
[00:00:09.099509] import 'jieba_pyfast._compat' # <_frozen_importlib_external.SourceFileLoader object at 0x7fa3cf936960>
[00:00:09.100432] # /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/__pycache__/prob_emit.cpython-312.pyc matches /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/prob_emit.py
[00:00:09.103991] # code object from '/venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/__pycache__/prob_emit.cpython-312.pyc'
[00:00:58.043337] import 'jieba_pyfast.finalseg.prob_emit' # <_frozen_importlib_external.SourceFileLoader object at 0x7fa3cf883170>
[00:00:58.043791] # /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/__pycache__/prob_start.cpython-312.pyc matches /venv/path/lib/python3.12/site-packages/jieba_pyfast/finalseg/prob_start.py
tree .
.
├── fut.py
├── requirements.txt
└── test
├── __init__.py
└── test_fut.py
# fut.py
import jieba
def a():
pass
#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
# pip-compile --allow-unsafe --output-file=requirements.txt requirements.in
#
coverage[toml]==7.4.1
# via pytest-cov
iniconfig==2.0.0
# via pytest
jieba==0.42.1
# via -r requirements.in
packaging==23.2
# via pytest
pluggy==1.4.0
# via pytest
pytest==8.1.1
# via
# -r requirements.in
# pytest-cov
pytest-cov==4.1.0
# via -r requirements.in
# test/test_fut.py
import fut
def test_stuff():
assert True
System info:
When run trough pytest
tests take less than a second. With pytest -cov=.
they take ~ 44 seconds. Removing import jieba
makes tests run fast again.
To get output lines above I run (ts
is from moreutils
Ubuntu package):
pytest --cov=. --debug=/dev/stdout | ts -s '[%H:%M:%.S]' | tee pytest_debug.txt
python -v -m pytest --cov=. 2>&1 | ts -s '[%H:%M:%.S]' | tee python_debug.txt
Ah, this is interesting. The problem is that the file has a long const dict that expands into 30k+ lines. Each line will execute to produce part of the dict, and trigger a callback into the trace function, which will slow down the program significantly.
This is not a bug, and almost expected - it's just the unfortunate code that makes it weird.
I believe this performance regression is causing debugging to become practically unusable in 3.12+ when using certain dependencies. This repro takes two and a half minutes to run on a Macbook pro, due to the geocoder
module importing many large dictionaries.
# Run with python -m pdb repro.py
# Type n to execute the import statement with an active breakpoint
from phonenumbers import geocoder, parse
print(geocoder.description_for_number(parse('+13215555555'), 'en'))
Perhaps the library can be updated to use the fast C-based JSON scanner for performance, but that's a significant change to make to work around a language performance regression.
Just to confirm, what command did you use after python -m pdb repro.py
? pdb
should be brought up before importing and if you use c
it should not add any overhead.
Good point. I typed n
to step one line, which sets a breakpoint.
Yes, n
will trigger this. That's a known issue in pdb
(step and breakpoint is costly) and some changes in 3.12 made it worse, especially for python file with a huge chunk of code in a single namespace (it's fine if it spreads in multiple functions). I am investigating any solution that can make this better, but I don't think we can backport any optimizations in 3.12.
Performance improvements in any future version would be great. That said, it is conceivable to me that generating very large Python files will become a library anti-pattern due to this.
I came up with a patch to fix the long dict issue. I'm not sure if you are interested in trying the patch in #118127. If you are, let me know the result.
Hi @nedbat , do you have a chance to test the performance of coverage on 3.13b against 3.12? I made a few optimizations and I'd hope the performance issue is relieved a bit.
I'm having a hard time re-creating the scenarios I used before :(
That's okay. Does coveragepy have nightly validations for 3.13b? Is there a basic timing benchmark that we can refer to? Just a ballpark would be helpful.
You can try running the ndindex tests, which was the original thing I noticed this issue with in https://github.com/nedbat/coveragepy/issues/1665. Just clone
https://github.com/Quansight-Labs/ndindex
then install the packages in requirements-dev.txt
and run
time pytest
(coverage is enabled automatically). On my computer with Python 3.11.9, the tests take 2:48 and with Python 3.12.0 they take 4:21 (both with coverage 7.5.2). If you want to run the tests with coverage disabled you should remove the coverage options from pytest.ini
. Note that there is a degree of variation in the test times due to the use of hypothesis. You might want to set the --hypothesis-seed
to avoid this.
Coverage.py has an overnight job that runs its test suite on nightly builds of Python, including 3.13 and 3.14 now: https://github.com/nedbat/coveragepy/actions/runs/9265205433 There isn't a timing test in there though.
Okay thanks @nedbat and @asmeurer . I was hoping for some quick and existing benchmark but it's okay that we don't have any for now. Just wondering if we should close this issue now that we have implemented some optimizations.
I have a benchmarking framework, but it's fragile. I will try to patch it up later this week.
Bug report
Checklist
A clear and concise description of the bug
A bug report in coverage (https://github.com/nedbat/coveragepy/issues/1665) is reduced here to a pure do-nothing trace function for sys.settrace.
Reproduction:
I don't know if the unusual numpy build has something to do with this.
Linked PRs