pytest-dev / pytest-xdist

pytest plugin for distributed testing and loop-on-failures testing modes.
https://pytest-xdist.readthedocs.io
MIT License
1.44k stars 227 forks source link

Time Before Session Starts Increases Exponentially With Number Of Workers #850

Open micric opened 1 year ago

micric commented 1 year ago

Hello, I'm upgrading from Python 3.8 to Python 3.10. In my previous configuration, this was the pytest set up:

platform linux -- Python 3.8.6, pytest-3.10.1, py-1.9.0, pluggy-0.13.1
plugins: random-order-1.0.4, random-0.2, xdist-1.20.1, forked-1.3.0

It's a test suite with several thousands of tests. Using -n auto, everything was working fine on 32 workers.

Because of the upgrade, I had to upgrade also pytest and pytest-xdist. I tested a few versions, but I get the same problem with all of them. The current set-up:

platform linux -- Python 3.10.8, pytest-6.2.5, py-1.9.0, pluggy-0.13.1
plugins: xdist-2.0.0, forked-1.3.0, random-0.2, random-order-1.0.4

What happens is that between the pytest call and the start session message there is a gap of several minutes. Here's a list of minutes depending on the number of workers 1: 3:25 2: 5:48 3: 16:52 4: 17:27 8: 57:39

An example of the log:

build   28-Nov-2022 16:01:26    8
build   28-Nov-2022 16:59:05    ============================= test session starts ==============================
build   28-Nov-2022 16:59:05    platform linux -- Python 3.10.8, pytest-6.2.5, py-1.9.0, pluggy-0.13.1 -- /usr/local/bin/python

(8 is the number of workers, it's written by the same script that runs pytest right before it, like

echo "8"
pytest -v testSuite -n 8

With the old version, there was absolutely no time gap, while now it takes almost an hour just to start

Any insight of what might be the problem?

RonnyPfannschmidt commented 1 year ago

its unclear if this is related to execnet or xdist itself, a test with multiple python, pytest , xdist and execnet versions seems necessary

nicoddemus commented 1 year ago

This is really hard to pin down, unfortunately... you were using pytest-3.10? That's pretty old.

When upgrading, did you upgrade other libraries that might be causing this?

Things from the top of my head that might be affecting collection: recursive symlinks (we had fixes in pytest for that, might be a corner case causing a problem), new version of libraries doing stuff at importing time...

We cannot really guess much, as the pytest version upgrade has a huge gap. I suggest running some logging/profiling to see if something suspicious appears.

micric commented 1 year ago

Yes, the pytest version we were using was really old, unfortunately. Other packages were upgraded as well, I will check if any of them in particular caused the slowdown. Thank you for the advices. Do you have any particular suggestion do collect logs during collection?

nicoddemus commented 1 year ago

Sure, good luck.

Do you have any particular suggestion do collect logs during collection?

There's a bunch of Python profilers around that can be used. but I don't have one off the top of my head to recommend.

RonnyPfannschmidt commented 1 year ago

It's possible /think able that the execnet worker startup got a slowdown recently, unfortunately I'm unable to investigate within the next 6-12 months

kapilt commented 1 year ago

it would be useful to benchmark test collection times (--co) alone (sans xdist or any plugins) on old env and new env just to sanity check where the issue is. independently, I've had some success moving data directories out of test collection, cause pytest itself started looking at all files due to the need for plugins to load from non python files.

RonnyPfannschmidt commented 1 year ago

Execnet currently lacks apis to do this nicely