Closed huguesb closed 2 years ago
NB: initial report included a mypyc build failure, which turned out to be caused by using python 3.7.8 built via pyenv prior to the upgrade to Monterey. Using a freshly build 3.7.13 solved that. The flakiness persists however...
For what it's worth, I am seeing the exact same thing on Python 3.10, also on Mac Monterey (12.3.1), not sure if it's relevant but I have the Apple M1 cpu, and so I compile my Pythons for arm64.
I suspect that this is caused by some shared file system state, and only reproduces when running tests in parallel. All the tests passed when I ran them sequentially (on macOS). Also CI has been reliable, and it doesn't use much/any parallelism.
I saw this on Linux too, on Python 3.10.4. Confirm that things work reliably and consistently with pytest -q -n0 -k PEP561Suite
. However, as far as I can see from a casual glance, it seems that each case of the PEP 561 suite runs within its own venv (and associated unique tempdir) so I don't understand exactly how parallel mode gives rise to FS race conditions here.
I saw this on Linux too, on Python 3.10.4. Confirm that things work reliably and consistently with
pytest -q -n0 -k PEP561Suite
. However, as far as I can see from a casual glance, it seems that each case of the PEP 561 suite runs within its own venv (and associated unique tempdir) so I don't understand exactly how parallel mode gives rise to FS race conditions here.
pip
would be my first guess based on the stack trace I posted in the original bug report... It's notoriously prone to breakage if invoked concurrently, even on separate venvs, because it may still use shared global state, for instance when downloading packages or building wheels.
I got this suite to pass, reliably, in two ways -- neither of them very pretty. One is to just pass -n0
to this particular suite and avoid race conditions by virtue of not having any concurrency. The other involves using a filesystem-based lock around the installation steps. This would rely on a new entry in test-requirements -- but , for what it's worth, it's one that folks will have already if they use tox.
Some completely informal timings:
pytest -q -n0 -k PEP561Suite
took 79s
pytest -q -n4 -k PEP561Suite
with filelock took 48s
pytest -q -n8 -k PEP561Suite
with filelock took 47s
For reference, the current state of master branch does them rather more quickly -- but requires a bunch of attempts to pass all cases, in particular with more workers:
pytest -q -n4 -k PEP561Suite
took 26s (passed the 3rd attempt)
pytest -q -n8 -k PEP561Suite
took 19s (passed the 5th attempt)
I'm not sure if this actually affects a lot of people; it might be just something particular to my environment (although I did see it on both Mac and Linux recently). But if anyone thinks that the added running time is worth it, here's what I've done:
sequential: https://github.com/erikkemperman/mypy/commit/b68aac509a2f0990ccfb03d170a091483e57b67f
filelock: https://github.com/erikkemperman/mypy/commit/3648f5cb6f865e2a8b49483bfef83301207589ed
One thing I was hoping would work, but seems not to, is passing --cache-dir
. It seems pip does not work concurrently even with independent cache directories...
I think a lock on the tests is unfortunately the only way forward then.
By the way, the pip issue for this is https://github.com/pypa/pip/issues/2361
Yes, I tried setting —cache-dir as well, and —no-cache, but no luck.
@erikkemperman well if you want to make a PR I'd be happy to review it, I think the tests running a bit slower rather than failing is an acceptable tradeoff.
@ethans Done: https://github.com/python/mypy/pull/12857
Excellent @erikkemperman . @ethanhs - FYI
Excellent @erikkemperman . @ethanhs - FYI
Ah my apologies for misspelling the name!
Excellent @erikkemperman . @ethanhs - FYI
Ah my apologies for misspelling the name!
My pleasure to take part in great doing!
Bug Report
Tests are failing for the master branch on my macOS laptop.
A few tests in
PEP561Suite
are flaky for reasons I don't quite understand.To Reproduce
Expected Behavior
Tests pass.
Actual Behavior
There's usually exactly one failure per run. On some rare occasions I've seen more than one failure, or no failures.
Here are some sample failures:
Your Environment
mypy commit
f501cf649d7976077a7196e3548d773d67340a8c
Python 3.7.13 macOS Monterey 12.3.1 x86_64