Closed AlexWaygood closed 1 year ago
We could try switching to ProcessPoolExecutor
and see if that helps.
We could try switching to
ProcessPoolExecutor
and see if that helps.
Implemented in #9556. I'll hold off work on trying to cache venvs between runs for a while, so we can test this change in isolation and see if the problem is actually fixed. Caching venvs between runs would be good to do anyway, though (and not just for this test -- stubtest_third_party
would benefit as well), so I'll work on that after we've waited a while.
@JelleZijlstra, could you possibly also see if you can still reproduce the race after #9556?
Running it again now.
I ran it for a few hours and saw no failures. I think the issue is fixed!
Maybe there really is a bug in the ThreadPoolExecutor.
The traceback indicates that two threads appear to be attempting to access the same Python executable at the same time.
More precisely, it indicates that we're trying to execute the Python executable while someone has the file open for writing. I looked at the code for venv
and it's not clear how that is possible.
Thanks v much @JelleZijlstra for helping debug this, and thanks @srittau for suggesting the fix!
I was pretty pleased with how easy it was to swap in ProcessPoolExecutor
here, I've got to say. I've had cause to use ThreadPoolExecutor
before, but never ProcessPoolExecutor
before now.
9408 (my PR) appears to have introduced a race condition in
mypy_test.py
.Evidence of the race
We've had several "flukey" errors in CI over the last few days, all with the same traceback:
The traceback is the same every time:
```pytb Testing third-party packages... Traceback (most recent call last): File "./tests/mypy_test.py", line 557, inCause of the race
The race happens in this part of the test, where multiple virtual environments are set up concurrently using a threadpool.
https://github.com/python/typeshed/blob/0a291da2f08d90604e7eba80abe1e3f64a8a10a4/tests/mypy_test.py#L438-L445
However, it's unclear exactly why the race occurs. The traceback indicates that two threads appear to be attempting to access the same Python executable at the same time. But it's hard to see why that would be the case.
Reproducing the race condition
@JelleZijlstra has managed to reproduce the race locally on a Linux machine using Python 3.7. This rules out any GitHub Actions-specific hypotheses for why the race might be happening.
I have not been able to reproduce the failure in my local environment (Windows, Python 3.10). (I've tried creating a venv for 145 stubs packages in a threadpool with 20 workers -- still no race for me.)
Note that
mypy_test.py
is only ever run on Ubuntu in CI.Fixing the issue
Idea (1): Just create the venvs one-at-a-time
You can't have a race condition if you don't try to do things concurrently. Locally, this makes the test much slower for me -- the time taken to create the venvs goes up from around 20 seconds to around 60 seconds. The specific part of creating a venv that is slow is the
venv.EnvBuilder._setup_pip
function in the stdlib -- the exact part that appears to be implicated in the race condition.However, if we did go this route, we could potentially mitigate the performance regression by caching venvs between runs of the test (both locally and in CI).
Idea (2): Do concurrency differently
Instead of using a threadpool, we could manually create our threads. Or we could move to an
asyncio
-based concurrency model.Either of these might fix the race condition, but it's hard to know if they would without knowing exactly what's causing the race condition.
Idea (3): Paper over the problem
This diff would probably "fix" the problem, but it wouldn't be a particularly principled fix: