Closed The-Compiler closed 2 years ago
@hroncok Is there perhaps a way to find out which packages in Fedora took significantly longer to run tests with pytest 7 compared to 6?
There is. I can rebuild all the succeeded packages with pytest 6 and then we can parse the time information for pytest from the logs. We also have the overall build time, but that is unfortunately influenced by unstable external factors.
If you could do that, that'd be amazing! Would be great to see how big the impact is, and also perhaps find a project to reproduce the issue with. Let me know if there is something I can help with.
For pytest 7:
while read line; do
pkg=$(echo "$line" | cut -f1)
chroot=$(echo "$line" | cut -f2) # this will always be the same, but meh
buildid=$(echo "$line" | cut -f3)
url="https://download.copr.fedorainfracloud.org/results/churchyard/pytest-7/${chroot}/$(printf "%08d" $buildid)-${pkg}/build.log.gz"
echo $pkg
curl -s -L "$url" | gzip -d - | grep -E '^=+ [[:digit:]]+ passed.* =+$'
echo
echo
done < <(copr monitor churchyard/pytest-7 --output-format text-row | grep 'succeeded$')
...running...
In the meantime, the pytest 6 builds run in https://copr.fedorainfracloud.org/coprs/churchyard/pytest-6/builds/
I the times will indeed be off, I can resubmit the pytest-7 builds once more to eliminate a difference in e.g. upgraded hardware or whatever, the pytest-7 builds are quite old now.
2ping
================== 121 passed, 4 skipped in 63.86s (0:01:03) ===================
abrt
============================== 45 passed in 2.25s ==============================
andriller
======================== 88 passed, 2 warnings in 3.96s ========================
androguard
==== 102 passed, 2 skipped, 30 deselected, 1 xfailed, 27 warnings in 45.16s ====
APLpy
argparse-manpage
============================== 11 passed in 4.05s ==============================
artifacts
======================== 28 passed, 2 warnings in 2.11s ========================
auditwheel
======================== 73 passed, 1 warning in 0.76s =========================
...
The entire log: pytest-7.txt
In case you want to parse that, note that some packages have no info (pytest was not actually invoked during the build or the output is unusual) and some have multiple infos (pytest was invoked multiple times during one build).
noticed almost 2x slowdown compared to 6.2.5 on py3.8
The Fedora tests are with Python 3.10.1.
Great, thank you! Parsed with:
import pathlib
import re
from rich import print # optional
txt = pathlib.Path("pytest-7.txt").read_text()
data = [e.strip().splitlines() for e in txt.strip().split("\n\n")]
total_times = {}
for name, *rest in data:
if not rest:
continue
# print(name)
times = []
for line in rest:
assert line.startswith("=") and line.endswith("=")
m = re.match(r".* in ([\d.]*)s ", line)
times.append(float(m.group(1)))
total_times[name] = sum(times)
print(total_times)
so we should be able to compare things once you have the same logs for pytest 6 ready. Thanks again! FWIW I'd hope that the Python version doesn't make a difference, given that the reporter had the slowdown with two different pytest versions but the same Python version (I assume).
I've compared the two profile files - the information in them seem to be a little tricky, but here is what I learned:
console_main
1.7s, pytest_collection
0.8s, pytest_runtest_protocol
0.54sconsole_main
3.9s, pytest_collection
1.5s, pytest_runtest_protocol
1.54sposix.stat
- on pytest 6, it was called 3878 times for a total of 0.25s, on pytest 7 it's called 8408 times for a total of 2.28s (i.e. called more, but also much slower?)Unfortunately I can't seem to get the callees of that from the profile... but one offender I found by clicking around seems to be pathlib.is_file
in _getconftestmodules
- pytest 6:
pytest 7:
Similarly, during collection, with pytest 6:
pytest 7:
A different look with another tool (pyprof2calltree
and KCacheGrind
instead of SnakeViz
), pytest 6:
pytest 7:
Take it with a grain of salt as interpreting Python profiles seems to differ a bit depending on the tools usually, but that gives us a rough idea where to look I guess. @bluetech do you have any idea how this could happen? I'm also wondering if there is simply something different about their environment, or they perhaps have a virtualenv or somesuch which isn't in norecursedirs
for one setup but not the others...
After some more digging around, it looks like most of that code is older - I wonder if this is caused by 614f5394b50ac88b7aa4e97b43acb08aca383d95 (and 16e5fbe3714b0e1b0ab0d27bf3938bf2218c90cd missing some case) perhaps?
the way the code repeatedly checks for dirs/files and subdirs seems confusingly expensive, it may be sensible to put the conftests/ancors into the collection tree (and link the conftests to pakcages/directories)
Note that the pytest 6 file may not contain the exact same set of packages, so don't count a naïve total sum: pytest-6.txt
Some more detailed analysis based on those two logs:
results in:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ Name ┃ pytest 6 ┃ pytest 7 ┃ Δabs ┃ Δrel ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
[...]
│ python-validators │ 0.47 │ 1.38 │ 0.91 │ 193.62% │
│ python-tinycss2 │ 1.23 │ 3.62 │ 2.39 │ 194.31% │
│ python-managesieve │ 0.18 │ 0.53 │ 0.35 │ 194.44% │
│ python-uamqp │ 0.96 │ 2.83 │ 1.87 │ 194.79% │
│ python-rangeparser │ 0.01 │ 0.03 │ 0.02 │ 200.00% │
│ python-whichcraft │ 0.01 │ 0.03 │ 0.02 │ 200.00% │
│ python-pathvalidate │ 3.46 │ 10.64 │ 7.18 │ 207.51% │
│ python-biscuits │ 0.07 │ 0.22 │ 0.15 │ 214.29% │
│ python-colorzero │ 3.77 │ 11.87 │ 8.10 │ 214.85% │
│ python-tomli-w │ 0.49 │ 1.57 │ 1.08 │ 220.41% │
│ python-tomli │ 0.82 │ 2.92 │ 2.10 │ 256.10% │
│ python-node-semver │ 1.09 │ 3.92 │ 2.83 │ 259.63% │
│ sum │ 31191.28 │ 41189.35 │ 9998.07 │ 32.05% │
│ avg │ 32.73 │ 43.22 │ 10.49 │ 24.50% │
└───────────────────────────────────────────┴──────────┴──────────┴─────────┴─────────┘
So it does look like the build in total (only considering packages in both sets) took some 2h 45min longer, or around 30%...
Of course there are a lot of factors involved, but it does seem like things got slower overall, perhaps enough that we should take a closer look?
Full output:
Should I restart the 7 builds to make it more up to date?
If you don't mind the additional load this puts on Fedora infra, that would be great - I guess it would certainly make the values more accurate.
Updated pytest 7 times: pytest-7.txt
Well, this is looking very different indeed:
│ python-bleach │ 0.73 │ 1.47 │ 0.74 │ 101.37% │
│ python-tinyrpc │ 1.11 │ 2.25 │ 1.14 │ 102.70% │
│ python-managesieve │ 0.18 │ 0.37 │ 0.19 │ 105.56% │
│ python-validators │ 0.47 │ 0.98 │ 0.51 │ 108.51% │
│ python-flake8-import-order │ 0.51 │ 1.07 │ 0.56 │ 109.80% │
│ python-yarl │ 2.66 │ 5.86 │ 3.20 │ 120.30% │
│ python-dbfread │ 0.07 │ 0.16 │ 0.09 │ 128.57% │
│ python-tinycss2 │ 1.23 │ 2.88 │ 1.65 │ 134.15% │
│ python-tomli-w │ 0.49 │ 1.21 │ 0.72 │ 146.94% │
│ python-systemd │ 0.14 │ 0.35 │ 0.21 │ 150.00% │
│ python-tomli │ 0.82 │ 2.42 │ 1.60 │ 195.12% │
│ python-rangeparser │ 0.01 │ 0.03 │ 0.02 │ 200.00% │
│ python-pathvalidate │ 3.46 │ 10.39 │ 6.93 │ 200.29% │
│ python-uamqp │ 0.96 │ 2.90 │ 1.94 │ 202.08% │
│ python-colorzero │ 3.77 │ 11.69 │ 7.92 │ 210.08% │
│ python-node-semver │ 1.09 │ 3.78 │ 2.69 │ 246.79% │
│ sum │ 31253.31 │ 31544.78 │ 291.47 │ 0.93% │
│ avg │ 33.50 │ 33.81 │ 0.31 │ 18.31% │
└───────────────────────────────────────────┴──────────┴──────────┴─────────┴─────────┘
(note that "100%" still means "double the original time" though...)
Anyone volunteering to look into some of those projects in more detail to figure out if the same tests did run, and if so, if there really is some kind of slowdown in pytest with them?
Full output:
I'll definitely look at some of them.
I'm going to bet that the slowness comes from the extra stats due to the pathlib migration, possibly with the addition of a slow filesystem setup on the Fedora/user's system (maybe running NFS or such?). But I'm not making assumptions :)
No NFS, the filesystem is local. (I'll double check.)
Great, thanks @bluetech! I already asked the reporter and they said:
Latest MacOS, plus same result within python3.8 Docker image. Packages installed using
pip install --user
So nothing too special, at least from what it sounds like there...
I can reproduce the slowdown pretty easily using python-node-semver
from above -- appears to mostly be spending time in pathlib
though it's due to a hot call here: https://github.com/pytest-dev/pytest/blob/71baf24b6d41da6704433ca9909b5e6d954564b9/src/_pytest/main.py#L539 (which gets called 10s of thousands of times -- approximately 20x per test)
I've attached a profile generated using these tools
this is using ~stock python 3.8.10 on ubuntu 20.04
this patch "fixes" the performance issue, though I haven't studied the code close enough to see whether the optimization is correct:
$ diff -u old.py ../venv-pytest7/lib/python3.8/site-packages/_pytest/config/__init__.py
--- old.py 2022-01-06 19:47:05.974081512 -0500
+++ ../venv-pytest7/lib/python3.8/site-packages/_pytest/config/__init__.py 2022-01-06 19:45:09.238082625 -0500
@@ -524,6 +525,7 @@
if x.is_dir():
self._getconftestmodules(x, importmode, rootpath)
+ @lru_cache(maxsize=1)
def _getconftestmodules(
self, path: Path, importmode: Union[str, ImportMode], rootpath: Path
) -> List[types.ModuleType]:
heh, bisecting gives a commit that definitely makes sense 614f5394b50ac88b7aa4e97b43acb08aca383d95:
614f5394b50ac88b7aa4e97b43acb08aca383d95 is the first bad commit
commit 614f5394b50ac88b7aa4e97b43acb08aca383d95
Author: Ran Benita <ran@unusedvar.com>
Date: Fri Oct 1 14:28:16 2021 +0300
Avoid `@lru_cache` on methods
The problem with `@lru_cache` on methods is that it also captures `self`
and leaks it until the entry is evicted (if ever).
src/_pytest/assertion/rewrite.py | 7 ++-----
src/_pytest/config/__init__.py | 1 -
2 files changed, 2 insertions(+), 6 deletions(-)
bisect run success
I bisected using this script:
#!/usr/bin/env bash
set -euxo pipefail
pip uninstall -yqq pytest >& /dev/null
pip install -qq . >& /dev/null
cd ../python-node-semver
git clean -fxfdq
timeout 2.5 pytest nodesemver || exit 1
and this technique
git bisect start
git bisect bad HEAD
git bisect good 6.2.5
git bisect run ../t.sh
the story doesn't quite end there though -- the second commit in that patch is supposed to patch up that performance hit but makes a tiny edge-case mistake on the caching code -- it ignores the case when the cached value is falsey
fixing that gets us most of the way back to the pytest 6.x performance but not quite because the beginning of _getconftestmodules
still invokes a lot of slow pathlib is_file
and parent
calls (~15% slower than 6.2.5 (which is way better than 230% slower!))
diff --git a/src/_pytest/config/__init__.py b/src/_pytest/config/__init__.py
index dee3d4974..174d80a01 100644
--- a/src/_pytest/config/__init__.py
+++ b/src/_pytest/config/__init__.py
@@ -538,7 +538,7 @@ class PytestPluginManager(PluginManager):
# Optimization: avoid repeated searches in the same directory.
# Assumes always called with same importmode and rootpath.
existing_clist = self._dirpath2confmods.get(directory)
- if existing_clist:
+ if existing_clist is not None:
return existing_clist
# XXX these days we may rather want to use config.rootpath
an idea to get back to the original caching but without a memory leak would be to assign a per-instance lru_cache:
diff --git a/src/_pytest/config/__init__.py b/src/_pytest/config/__init__.py
index dee3d4974..15a9c0ef1 100644
--- a/src/_pytest/config/__init__.py
+++ b/src/_pytest/config/__init__.py
@@ -354,6 +354,7 @@ class PytestPluginManager(PluginManager):
self._confcutdir: Optional[Path] = None
self._noconftest = False
self._duplicatepaths: Set[Path] = set()
+ self._getconftestmodules = lru_cache(maxsize=128)(self._getconftestmodules_uncached)
# plugins that were explicitly skipped with pytest.skip
# list of (module name, skip reason)
@@ -524,7 +525,7 @@ class PytestPluginManager(PluginManager):
if x.is_dir():
self._getconftestmodules(x, importmode, rootpath)
- def _getconftestmodules(
+ def _getconftestmodules_uncached(
self, path: Path, importmode: Union[str, ImportMode], rootpath: Path
) -> List[types.ModuleType]:
if self._noconftest:
I used the same technique in my text editor though admittedly for a different reason
an idea to get back to the original caching but without a memory leak would be to assign a per-instance lru_cache
This is interesting, but doesn't that still creates a circular reference, because lru_cache(...)(self._getconftestmodules_uncached)
references a bound method (which references self
), which is then stored in a variable in self
?
Perhaps it is simple enough to implement an infinite cache using a dict? Caching all calls seems feasible in this case.
Great stuff @asottile, especially the diagram. I am able to reproduce your findings.
First, I am going to submit your is None
patch and backport it. I think that we should do it anyway, to have a proper baseline for any other improvements in the git history. For me the timings are:
1.59s
5.37s
(3.4x slower)is None
fix: 1.96s
(1.23x slower)Regarding adding back the lru_cache to _getconftestmodules
, I would like to avoid it; I always had a bad feeling about it. Aside from the double caching and missing the path->directory normalization, I just feel that caching side-effecting functions is not a good idea with lru_cache
but needs more care. So I'd like to find a way to recover the lost performance some other way.
In the meantime, IMO the is None
fix will be sufficient to unblock the 7.0 release, but that's up to @The-Compiler to decide (I'm fine with either way, including bringing back the lru_cache for now).
I agree that should be sufficient. I feel like that gets things to the same ballpark again, so that we can have further performance fixes in 7.0.1 or 7.1.0.
Thanks to @bluetech @asottile @hroncok for an incredibly quick turnaround on what seemed like a tricky issue without much detail! :heart:
Oh, also, how do folks feel about a quick 7.0.0rc2 now? We still need to take care of #9002 and #9471 (plus potentially "unknown unknowns" from #9415?), but that would mean it's easy for people (and probably Fedora?) to test the fixes which we have so far (and it makes it easier to ask the original reporter to test this fix too).
I might be able to test that in Fedroa with the git HEAD as well even without rc.
an idea to get back to the original caching but without a memory leak would be to assign a per-instance lru_cache
This is interesting, but doesn't that still creates a circular reference, because
lru_cache(...)(self._getconftestmodules_uncached)
references a bound method (which referencesself
), which is then stored in a variable inself
?Perhaps it is simple enough to implement an infinite cache using a dict? Caching all calls seems feasible in this case.
the memory issue isn't due to a cycle but due to lru_cache
being attached to the class (~essentially a global variable with process lifetime). when the cache is assigned to the instance it is collected with the instance itself
FWIW the reporter said:
Great news! I confirm no more delays detected on my projects. Well done. Tnx
Tests collection time went from 32s (6.2.5) to 2.3s (7.0.0) on a large test suite of 4000 tests Party popper
(edited to add the correction from the follow-up tweet)
so looks like all in all, 7.0.0 was a great performance improvement - thanks to everyone involved!
According to nikolaizujev on Twitter:
Unfortunately no reproducer, but I got
.prof
dumps from them: Dropbox - pytest - Simplify your lifeOpening this to remind myself to have a look at those before the final release - but ideally we'd have a proper reproducer so we can e.g. bisect this. Does anyone have any guess what could be the culprit? I only see a very small slowdown for my own project.
@hroncok Is there perhaps a way to find out which packages in Fedora took significantly longer to run tests with pytest 7 compared to 6? This would mainly be for runs which still pass, because if there are failures, all bets regarding the runtime are off anyways.