pytest-dev / pytest

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
https://pytest.org
MIT License
11.87k stars 2.65k forks source link

Cache: Crash on Windows when using pytest-xdist #12671

Closed criemen closed 1 week ago

criemen commented 1 month ago

pytest: 8.3.2 OS: Windows Server 2022 (GH Actions)

We're seeing the following spurious failure on Windows:

pytest -n auto -vv --durations=15 --durations-min=5 --codeql=built --shard-id=$((${SHARD%/*} - 1)) --num-shards=${SHARD#*/} integration-tests/all-platforms/python
============================= test session starts =============================
platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- C:\Users\runneradmin\AppData\Local\pypoetry\Cache\virtualenvs\non-package-mode-kOw55a-5-py3.12\Scripts\python.exe
cachedir: .pytest_cache
rootdir: C:\a\semmle-code\semmle-code
configfile: pyproject.toml
plugins: shard-0.1.2, timeout-2.3.1, xdist-3.6.1
timeout: 1200.0s
timeout method: thread
timeout func_only: False
created: 8/8 workers
collected 1 item
Running 1 items in this shard: integration-tests/all-platforms/python/database-create/test.py::test
codeql: v2.18.2+202407301318 (eded3f82a27ca3c201d6512c071e1956af30cce2) at C:\a\semmle-code\semmle-code\target\intree\codeql  pytest  dist\codeql.EXE
8 workers [1 item]

scheduling tests via LoadScheduling

=================================== ERRORS ====================================
________________________ ERROR collecting test session ________________________
<frozen genericpath>:112: in samefile
    ???
E   FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\a\\semmle-code\\semmle-code\\pytest-cache-files-u0joql63'
____________________________ ERROR collecting gw3 _____________________________
Different tests were collected between gw0 and gw3. The difference is:
--- gw0

+++ gw3

@@ -1 +0,0 @@

-integration-tests/all-platforms/python/database-create/test.py::test
To see why this happens see Known limitations in documentation
=========================== short test summary info ===========================
ERROR  - FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\a\\semmle-code\\semmle-code\\pytest-cache-files-u0joql63'
ERROR gw3
============================== 2 errors in 2.63s ==============================
Error: Process completed with exit code 1.

Note that we're collecting one test (so test collection is fast), and we have code that is creating cache entries at the same time. Probably, this is in the same area as https://github.com/pytest-dev/pytest/issues/12580 (which I filed and fixed), but it's not entirely clear to me what's happening here.

I'm mainly looking for advice how to get a proper stacktrace here, then I'm happy to investigate further on my own.

nicoddemus commented 1 month ago

This error means that the two workers collected a different set of tests... this usually is caused by some error occurring during collection.

creating cache entries at the same time

Can you elaborate on this? Is that related to that shard plugin which appears in the terminal?

If that plugin does something during collection/initialization, it probably needs to be adjusted in order to account for xdist collecting workers in parallel.

criemen commented 1 month ago

The problem I see is that one worker crashes with

E   FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\a\\semmle-code\\semmle-code\\pytest-cache-files-u0joql63'

which is (presumably) during a call to this function (the directory name matches what that code is doing). I don't see which function there would be throwing a FileNotFoundError though, hence my question of how to get a better backtrace.

As one of the worker crashes with this error, it doesn't finish test collection, and therefore reports the second error, but the root cause is that cache initialization doesn't work for some reason. I fixed one problem related to that in https://github.com/pytest-dev/pytest/issues/12580 already.

We are indeed doing things in the worker init/during test collection using the pytest cache in our conftest.py, but as far as I'm aware, none of that is unsafe in conjunction with xdist.

RonnyPfannschmidt commented 1 month ago

More details needed to figure if we hit a platform specific race condition

github-actions[bot] commented 1 month ago

This issue is stale because it has the status: needs information label and requested follow-up information was not provided for 14 days.

criemen commented 1 month ago

The easy way to reproduce this doesn't work https://github.com/criemen/pytest-crash-win (i.e. it doesn't crash), so I'll need to put in some more effort into distilling down what we're doing internally to an external reproducer.

github-actions[bot] commented 2 weeks ago

This issue is stale because it has the status: needs information label and requested follow-up information was not provided for 14 days.

github-actions[bot] commented 1 week ago

This issue was closed because it has the status: needs information label and follow-up information has not been provided for 7 days since being marked as stale.