sorcio commented 1 year ago

Bug report

Bug description:

Earlier discussion: https://github.com/python/cpython/pull/109710#discussion_r1334470510

This issue is specific to some systems with kqueue. I can reproduce on FreeBSD, and according to the internet it should be reproduced on older macOS versions (10.11) and on OpenBSD/NetBSD (with a different error code). A current version of macOS (13) is not affected.

Minimal snippet to explain the root issue:

import os, selectors
sel = selectors.DefaultSelector()
read_fd, write_fd = os.pipe()
# Close one end of the pipe
os.close(read_fd)
# Register the other end of the pipe
sel.register(write_fd, selectors.EVENT_WRITE)

Running this code on Linux or macOS will not raise any exception. The last call to sel.register() will return a list with an event, signaling that a write on the FD is possible. A subsequent write will, of course, raise a BrokenPipeError. On FreeBSD, sel.register() will raise directly.

asyncio does not account for this platform difference. This reproducer is a bit contrived because it needs to trigger a race condition[^1]:

[^1]: the other end of the pipe needs to be closed after _UnixWritePipeTransport.write() calls os.write() but before it tries to register a writer.

import asyncio
from itertools import count
import os
from threading import Thread

class MyPipeProtocol(asyncio.BaseProtocol):
    def __init__(self):
        self.is_connection_lost = False

    def connection_lost(self, exc):
        self.is_connection_lost = True

async def broken_pipe_repro():
    read_fd, write_fd = os.pipe()
    os.set_blocking(write_fd, False)
    write_file = open(write_fd, "wb")
    loop = asyncio.get_running_loop()
    transport, proto = await loop.connect_write_pipe(MyPipeProtocol, write_file)

    # Pass one end of the pipe to another thread that will eventually close it
    t = Thread(target=lambda: os.close(read_fd))
    t.start()

    try:
        # This line will never fail on Linux/macOS, but might on other BSDs:
        transport.write(b"ping" * 65536)
        await asyncio.sleep(0)
        assert proto.is_connection_lost
    finally:
        t.join()
        transport.close()

async def amain():
    for i in count():
        if i % 1000 == 0:
            print(i)
        await broken_pipe_repro()

if __name__ == "__main__":
    asyncio.run(amain())

On most platform this will run indefinitely. On FreeBSD, after a few run I get a traceback:

Traceback (most recent call last):
...
  File "asyncio-broken-pipe.py", line 28, in broken_pipe_repro
    transport.write(b"ping" * 65536)
  File "cpython/Lib/asyncio/unix_events.py", line 713, in write
    self._loop._add_writer(self._fileno, self._write_ready)
  File "cpython/Lib/asyncio/selector_events.py", line 317, in _add_writer
    self._selector.modify(fd, mask | selectors.EVENT_WRITE,
  File "cpython/Lib/selectors.py", line 265, in modify
    key = self.register(fileobj, events, data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "cpython/Lib/selectors.py", line 508, in register
    self._selector.control([kev], 0, 0)
BrokenPipeError: [Errno 32] Broken pipe

The issue here is that transport.write() is not supposed to raise, as far as I understand.

The current implementation of _UnixWritePipeTransport.write() catches all exceptions on os.write() but not on self._loop._add_writer(). ~The same happens switching read and write ends.~ Probably the issue was not detected before because it's a rare condition and it doesn't happen on Linux. gh-109709 showed this occurring on a subprocess test case. The fix applied in that case works by wrapping the call to write() in an exception handler, but I think in general user code can't be expected to always catch that error.

I found prior discussion of the equivalent issue in Tokio. In their case, they decided to solve this at an abstraction level that is closer to KqueueSelector.register(), by ignoring the EPIPE and instead reporting the fd as ~readable~/writable (which is what users expect in other selectors, and what happens in modern macOS). libevent also does something similar. I wonder if this could be a valid solution for Python, because the actual work is done in selectmodule which is lower level. By contrast, to the best of my understanding, libuv does not have special handling for this.

It can also be caught in asyncio (at some layer, either in selector event loops, or in code that calls ~add_reader~/add_writer). Given that this is only seems to happen with pipes, it could make sense to handle this in pipe-specific code.

I can make a PR if this is accepted as a bug.

cc @vstinner

Edit note: my mistake, the error is not raised when registering the read end of a pipe, only the write end.

CPython versions tested on:

3.12, CPython main branch

Operating systems tested on:

Other

vstinner commented 1 year ago

I can make a PR if this is accepted as a bug.

I think that it makes sense to catch exceptions on _add_write(). I suppose that it would look like something like that:

            except Exception as exc:
                self._conn_lost += 1
                self._fatal_error(exc, 'Fatal write error on pipe transport')
                return

It looks like calling self._loop._remove_reader() in _close() is fine, since it should do nothing if _add_writer() failed.

sorcio commented 1 year ago

Related discussion from ten years ago in bpo-19293 (search for "EPIPE") and bpo-19294. The decision at the time was to use a socket pair instead of a pipe for stdin because some platforms (AIX and Darwin primarily) did not behave as expected, and it was deemed as a platform bug that Python could not solve. This was partially reverted recently (bpo-46364) by switching back to using pipes instead of socket pairs for non-AIX platforms.

Darwin changed behavior (since macOS 10.15) and now they say that registering a kqueue event on a half-closed pipe should not be an error.

As shown in the reproducer, the error does not only happen with a subprocess, because a pipe created elsewhere can be attached to asyncio. Going back to using a socketpair for FreeBSD instead of a pipe (like it used to be until before Python 3.11) could be an easy way out for subprocesses, but does not solve the problem for pipes.

Knowing that Apple fixed this, I'm more convinced that this shouldn't be handled in higher-level code. I will take a moment to consider the impact of changing this in KqueueSelector. Otherwise I will prepare a fix for asyncio.

sorcio commented 1 year ago

While looking into this I found out something that looks like a bug in KqueueSelector. When a fd is registered for both read and write, kqueue requires to register two filters; but KqueueSelector.select() will count the fd as only one (because it's one key) and call kqueue.control() with the smaller number. Therefore some events might not be returned, which could lead to worsened latency and possibly deadlock.

vstinner commented 1 year ago

While looking into this I found out something that looks like a bug in KqueueSelector. When a fd is registered for both read and write, kqueue requires to register two filters; but KqueueSelector.select() will count the fd as only one (because it's one key) and call kqueue.control() with the smaller number. Therefore some events might not be returned, which could lead to worsened latency and possibly deadlock.

Oh wow, that sounds bad!

sorcio commented 1 year ago

Open a separate issue and opened a PR for this https://github.com/python/cpython/issues/110038

python / cpython

[asyncio][FreeBSD] _UnixWritePipeTransport.write() may raise BrokenPipeError on FreeBSD #109757

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on: