pytest-dev / pytest-xdist

pytest plugin for distributed testing and loop-on-failures testing modes.
https://pytest-xdist.readthedocs.io
MIT License
1.47k stars 232 forks source link

Support --boxed on Windows #270

Open AlexanderTyrrell opened 6 years ago

AlexanderTyrrell commented 6 years ago

I am developing a C/C++ module for Python (3.6 64-bit) under Windows, and would like to use the pytest + xdist framework to write tests for it. Each test should run in its own process, in case there is a crash in one of the tests. xdist supports this with the --boxed option, but unfortunately this does not work under Windows (as xdist uses os.fork provided by _process/forkedfunc.py to spawn subprocesses).

I tried extending this using the multiprocessing package, but without success so far. It requires serializing the pytest.Item object (multiprocessing uses pickle to exchange functions and objects between processes), which seems far from simple given its complexity. My idea is to convert pytest.Item to a serializable object (e.g. SerializableItem), which can be used to generate a proper pytest.Item in the subprocess. I got as far as reaching the Pytest runner with the generated Item, but I am stuck with generating pytest.Item._request. Has anyone tried a similar approach? Maybe it is the wrong way to go given the complexity of the Item class.

I also tried to serialize pytest.Item using dill, which went much further than pickle, but the whole object could not be serialized (some part of the pytest.Config as far as I can tell).

Any help with this would be very welcome :-)

nicoddemus commented 6 years ago

Hi @AlexanderTyrrell,

Yeah you hit the main reason why pytest performs the collection on each worker as opposed to performing the collection on the master node and then distributing the items (see the bottom of OVERVIEW for a more detailed explanation).

Possibly a a solution would be to override pytest_runtest_protocol so that it spawns a new pytest process which executes just that test (similar pytest <nodeid>). There are quite a few details that need to be addressed with that, specially how to obtain the error report from the subprocess and convert it in a way that is usable by the worker, but seems like it should possible to cook up something.

AlexanderTyrrell commented 6 years ago

Hi @nicoddemus, Thank you for the very quick reply! And thanks for pointing me to the FAQ, it's quite helpful. I was hoping the collection could be kept in the master, in order to stay with a similar approach to os.fork (the multiprocessing basically uses os.fork on Unix). I will look at overriding pytest_runtest_protocol.

nicoddemus commented 6 years ago

Also it is worth mentioning pytest-mp, although it is in its early stages it might be closer to do what you need.

RonnyPfannschmidt commented 6 years ago

another note --boxed is a legacy option, it was moved into https://pypi.python.org/pypi/pytest-forked that is no longer maintained and will eventually be taken out of xdist as option at a major release

RonnyPfannschmidt commented 6 years ago

another general note, currently pytest items are structured in a way that makes it impossible to correctly de-serialize them with a simple serialization scheme, major internal re-factorings in pytest would be needed to even enable it

Peque commented 6 years ago

@AlexanderTyrrell Did you try with cloudpickle too?

If you are looking into multiprocessing and complex objects serialization you might want to have a look at osBrain (disclaimer: I am the main author). It is like multiprocessing but has integrated and configurable dill/cloudpickle/pickle/json/raw serialization. It integrates Pyro4 as well, so for simple tasks it is very easy to configure and access the agents (processes) remotely to retrieve results from the workers. For more complex architectures it uses pyzmq for message passing between agents. Should work on Windows, although I would recommend the latest development version for that.

nicoddemus commented 6 years ago

@Peque thanks for sharing cloudpickle, I did not know that library until now. 👍

While it would allow to pickle function objects, the main problem still remains: we need to send fixture and config objects between master and workers and keep them synchronized, in other words, changes to a fixture or config object made in master or a worker must be reflected back to the master and all other workers.

Peque commented 6 years ago

@nicoddemus In osBrain we also implemented what we call "channels" (advanced or more specific communication patterns implemented with basic ZMQ sockets), although they will not be documented until next release (probably during February). One of them is a synchronized PUB-SUB pattern between a master and many workers, in which the master shares an object/state with the workers sending updates on any change (and workers can notify the master on any change for it to publish the update back to all the other workers).

If you are thinking about using "shared memory" that can also be achieved easily as agents are by default guaranteed to be accessed remotely only by one other agent at a time (that is easy when you use message passing everywhere). So you do not have to worry about locks, race conditions etc..

But still, I do not know the details nor probably understand well the problem, maybe it is more complex. If I find some spare time I might look at it, as it sounds like a problem that could be easier to implement with osBrain than with bare Python.

nicoddemus commented 6 years ago

@Peque thanks for the explanation, what you have in osBrain is certainly interesting!

However we should probably prototype this somewhere to see if the synchronization between workers and master in this case improves performance; also it would require profound changes inside xdist, which must also be taken in consideration.

RonnyPfannschmidt commented 6 years ago

@Peque execnet is already handling the multi process part in a similar manner, explicitly using a limited serialization because pretending that complex remote objects can just work like normal objects is a leaky abstraction that easily topples over ^^ - in particular when latency or contention comes into play

Peque commented 6 years ago

@RonnyPfannschmidt Yeah, the remote-object-like-normal-objects part (the Pyro4 part of osBrain) is optional and just simplifies configuration and prototyping. For the most part, and for more complex architectures, it is all message passing with ZMQ under the hood, which still simplifies things as it is very flexible (i.e.: multiple communication patterns and you can very easily change the transport layer so remote and local processes can communicate in the same way).

Will have a look at execnet to better understand how it works, thanks! I may find some useful ideas there. :blush:

Peque commented 6 years ago

For the record: osBrain 0.6.0 is out with some documentation on channels (and in particular, on synced pub-sub channels). Just in case someone wants to try that approach.