pytest-dev / pytest-xdist

pytest plugin for distributed testing and loop-on-failures testing modes.
https://pytest-xdist.readthedocs.io
MIT License
1.48k stars 232 forks source link

collection efficiency with many tests #353

Open simon-weber opened 6 years ago

simon-weber commented 6 years ago

We've been running the patch from https://github.com/pytest-dev/pytest-xdist/issues/299 successfully for a while. One thing I've noticed is that with large test suites, it can take a while before the first worker begins running tests.

From what I can tell, this is due to collection overhead: with execnet debugging on, that's all that's going on during that time. Disabling workers from sending pytest_collectreport keeps the master idle during the time but doesn't speed things up, so presumably the bottleneck is the workers themselves. The collection time also does not seem affected by the number of workers.

Interestingly, collection under xdist seems much slower than just pytest: in my example of ~16k tests, it's ~1 minute vs ~4 minutes. My guess is there's something like https://github.com/pytest-dev/pytest-xdist/issues/279 going on.

Next I'm hoping to try profiling the workers during collection, but since that looks like it'll be painful I figured I'd ask for advice first. Is there anything notable that xdist collection does differently from pytest?

simon-weber commented 6 years ago

Most of this ends up being explained by pyc+pytest cache write times.

RonnyPfannschmidt commented 6 years ago

@simon-weber shouldnt we still take a look at resolving this - there is ways to make this more efficient i believe (i mean why cant we do the assertion rewriting on the coordinating process for example)

nicoddemus commented 6 years ago

i mean why cant we do the assertion rewriting on the coordinating process for example

You mean on the master node? This implies collecting on the master first, I believe.

simon-weber commented 6 years ago

Ah, if you think there's potential for improvement I'm happy to help. I closed this since the more I looked into it the more I realized I just didn't understand what was going on (especially around assertion rewriting).

In case it's helpful, here's the biggest things I noticed from some profiling:

RonnyPfannschmidt commented 6 years ago

reopened as structural issues on our side have clearly been demonstrated

simon-weber commented 6 years ago

This has an interesting interaction with coverage collection. Since the coverage settrace needs to happen before collection, the trace overhead applies there as well -- even if coverage isn't being calculated for collected modules. In my little benchmarks, using --cov for a dummy file with one line slows collection down by up to 50%.