pytest-dev / pytest-xdist

pytest plugin for distributed testing and loop-on-failures testing modes.
https://pytest-xdist.readthedocs.io
MIT License
1.47k stars 232 forks source link

total worker initialization time scales linearly to the number of workers #346

Open dwiel opened 6 years ago

dwiel commented 6 years ago

It seems that currently, initializing workers is blocking and done sequentially:

    def setup_nodes(self, putevent):
        self.config.hook.pytest_xdist_setupnodes(config=self.config, specs=self.specs)
        self.trace("setting up nodes")
        # from multiprocessing import Pool
        # p = Pool(len(self.specs))
        # nodes = p.map(lambda spec: self.setup_node(spec, putevent), self.specs)
        nodes = []
        for spec in self.specs:
            nodes.append(self.setup_node(spec, putevent))
        return nodes

This means that starting a large number of workers even on a single machine with a large number of cpu cores takes a long time. For example, it takes about 45 seconds to start one worker per cpu core on my machine with 88 cores. The effect is even more extreme when workers are being started on remote machines with additional network latency added.

With the advent of a larger number of cores per machine and more frequent access to large clusters of machines, it would be nice to quickly horizontally scale to a large number of workers, even in cases where this would be the difference between 5 minutes on 8 workers versus 8 seconds on 300 workers. Obviously it isn't quite that simple, but there is clearly room theoretically for improvement.

As you can see from the code posted above, i've tried using multiprocessing to paralyze the setup of new nodes, however, the execnet objects used aren't pickleable so this naïve solution does not work.

I've also spent a little bit of time investigating the use of something like ray for rapid distribution across an admittedly homogenous cluster, but i ran into trouble where Function objects were not pickleable even by dill and cloudpickle.

Has anyone looked into how else this problem could be solved? Perhaps my identification of the problem is also incorrect. Are there other critical factors that are preventing the use of a large number of workers?

RonnyPfannschmidt commented 6 years ago

this needs either a fix in execnet, or work on supporting multiprocessing/mitogen as a backend

dwiel commented 6 years ago

what do you think about the work involved in supporting alternative backends? Everything seems fairly tightly coupled to execnet right now, though perhaps not as much as i think.

RonnyPfannschmidt commented 6 years ago

i didnt even start with the initial analysis but i did decide to stop working on execnet myself

programmerjake commented 5 years ago

any progress on this?

RonnyPfannschmidt commented 5 years ago

nope

kapilt commented 4 years ago

this seems like it would be amenable to doing a concurrent future thread pool for the setup, assuming the underlying setup_node is thread safe?

RonnyPfannschmidt commented 4 years ago

there currently is no analysis on that

RonnyPfannschmidt commented 4 years ago

based on a brief look however i would guess that it is absolutely not thread safe

dwiel commented 4 years ago

Yeah thats the problem with the current method. A more fundamental change would be required to make it thread safe.

ssbarnea commented 3 years ago

That problem make xdist quite inconvenient to use. I often endup with xdist being slower on machine with more cores than running without parallel in serial. Any workarounds?

RonnyPfannschmidt commented 3 years ago

No, there's currently no way

WittierDinosaur commented 1 year ago

Hey gang - is there any planned work around this? It would be amazing if it could yield each worker once set up

RonnyPfannschmidt commented 1 year ago

this needs some work in execnet, which is currently bus-factored on me being on parernity leave

WittierDinosaur commented 1 year ago

Ah, that's very fair. I'm assuming this isn't the kind of thing someone can pick up quickly?

RonnyPfannschmidt commented 1 year ago

It's probably possible, but needs some gut digging, a hack to use a thread pool may be enough