mitogen-hq / mitogen

Distributed self-replicating programs in Python
https://mitogen.networkgenomics.com/
BSD 3-Clause "New" or "Revised" License
2.34k stars 199 forks source link

Sporadic EAGAIN errors due to non-blocking stdout #712

Open apyrgio opened 4 years ago

apyrgio commented 4 years ago

When using Mitogen, I sometimes see EAGAIN errors, due to stdout being non-blocking. This breaks the Python interpreter, as well as any other spawned commands, since they don't handle non-blocking file descriptors. To trigger this, one can simply do sys.stdout.write("A" * 1000000) on the remote, which will fill the kernel buffer and return EAGAIN, instead of blocking.

Reading the sections on "Standard IO Redirection" and "The IO Multiplexer", my understanding is that Mitogen replaces stdout with a Unix pipe. It gives the write end of the Unix pipe to the Python interpreter, and the read end to a poller (the I/O multiplexer). I understand that the read end typically needs to be non-blocking, else the I/O multiplexer is not fully async. However, I don't see why the write end (what the interpreter perceives as stdout) needs to be non-blocking as well, since it breaks the assumptions of many programs.

So, my question is, is stdout non-blocking by design, or is it a bug? If it's the former, what's the rationale behind it?

dw commented 4 years ago

This is/was an old bug at least in the Ansible extension, I definitely remember fixing it and even have an inkling of writing a test for it.

If you're using the core library, it's possible the problem was fixed in the wrong place, which is why you're still seeing it

Can you provide version of the library, OS, and if possible a minimal repro. Another possibility is that it's only fixed for e.g. Linux

apyrgio commented 4 years ago

Hi @dw. Thanks for taking a look at this and sorry for the late reply.

Here are some more details:

OS: Debian Stretch Mitogen version: 0.2.9 Python version: Reproduced with 3.5 and 2.7

Also, here's a small example that exhibits this error:

#!/usr/bin/env python3

import sys
import mitogen

def shout(num):
    sys.stdout.write("A" * num)

@mitogen.main()
def main(router):
    if len(sys.argv) != 3:
        sys.exit("Usage: ./test.py <hostname> <payload>")

    context = router.ssh(hostname=sys.argv[1])
    context.call(shout, int(sys.argv[2]))
    print("Passed. Try with a larger payload, e.g., 1000000")

If you run the above by connecting to a remote SSH machine and printing a large payload to stdout, you should see the following error:

$ python test.py my.ssh.host 1000000
Traceback (most recent call last):
  File "./test.py", line 9, in <module>
    @mitogen.main()
  File "/usr/lib/python3.5/dist-packages/mitogen/__init__.py", line 118, in wrapper
    func,
  File "/usr/lib/python3.5/dist-packages/mitogen/core.py", line 633, in _profile_hook
    return func(*args)
  File "/usr/lib/python3.5/dist-packages/mitogen/utils.py", line 158, in run_with_router
    return func(router, *args, **kwargs)
  File "./test.py", line 15, in main
    context.call(shout, int(sys.argv[2]))
  File "/usr/lib/python3.5/dist-packages/mitogen/parent.py", line 2017, in call
    return self.default_call_chain.call(fn, *args, **kwargs)
  File "/usr/lib/python3.5/dist-packages/mitogen/parent.py", line 1974, in call
    return receiver.get().unpickle(throw_dead=False)
  File "/usr/lib/python3.5/dist-packages/mitogen/core.py", line 963, in unpickle
    raise obj
mitogen.core.CallError: exceptions.IOError: [Errno 11] Resource temporarily unavailable
  File "<stdin>", line 3669, in _dispatch_one
  File "master:./test.py", line 7, in shout
    sys.stdout.write("A" * num)

where errno 11 stands for EAGAIN. For smaller payloads, the test passes as expected.

apyrgio commented 4 years ago

Hi @dw. Quick question, did you manage to reproduce it with the above script?