python / cpython

The Python programming language
https://www.python.org
Other
62.59k stars 30.03k forks source link

Avoid extraneous copy when `_recv`ing bytes through pipes in multiprocessing "connections" #96059

Open igozali opened 2 years ago

igozali commented 2 years ago

Feature or enhancement

Avoid extraneous copy when _recving bytes through pipes in multiprocessing "connections"

Pitch

In the default IPC implementation of the multiprocessing module using pipes, the Connection._recv method is defined as follows:

https://github.com/python/cpython/blob/586fc02be5b3e103bfddd49654034a898a8d6dfc/Lib/multiprocessing/connection.py#L378-L392

Seems like we can avoid a copy if we can read directly into a preallocated byte array? This would be beneficial if the bytes we're sending between processes are large. Another benefit here is that the buf.write() call above doesn't release the GIL, so in my workloads I've seen it causing stalls in another thread when trying to read from a multiprocessing.Queue instance.

Sample implementation that avoids the extraneous copy is shown below (bit untested). If it makes sense, I can submit a patch, but probably need some guidance to deal with the Windows case:

import io

def unix_readinto(fd, buf):
    # Thanks https://stackoverflow.com/q/13919006/1572989
    return io.FileIO(fd, closefd=False).readinto(buf)

class Connection(...):
    ...

    def _recv(self, size, readinto=unix_readinto):
        buf = io.BytesIO(initial_bytes=bytes(size))
        handle = self._handle

        mem = buf.getbuffer()
        total = 0
        while total != size:
            n = readinto(handle, mem[total:])

            if n == 0:
                if total == 0:
                    raise EOFError
                else:
                    raise OSError("got end of file during message")

            total += n

        return buf

Previous discussion

N/A

corona10 commented 2 years ago

Hmm according to my benchmark, it makes slower on macOS. https://github.com/python/pyperformance/pull/228

Would you like to suggest your benchmark to prove your optimization? 46.2 ms +- 1.1 ms -> 51.0 ms +- 1.1 ms (your suggestion)

cc @mdboom

igozali commented 2 years ago

Thanks for the feedback! That's interesting and somewhat counterintuitive? I haven't had the chance to test the above much, just that it seemed to make sense theoretically that removing an extra copy would be good. Will try some things out on my side.