jgehrcke / gipc

gevent-cooperative child processes and inter-process communication
https://gehrcke.de/gipc
MIT License
83 stars 13 forks source link

Crash on SIGPIPE #12

Closed jgehrcke closed 7 years ago

jgehrcke commented 10 years ago

Originally reported by: Heungsub Lee (Bitbucket: sublee, GitHub: sublee)


Hi jgehrcke,

gipc-0.4.0 resets SIGPIPE handler to SIG_DFL so a subprocess crashes if it writes to a closed socket. Python ignores SIGPIPE by default. Is it expected behavior? There's no workaround to avoid subprocess crashing?

Regards, Heungsub


jgehrcke commented 10 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


I have addressed this issue in the current release 0.5.0. It incorporates corresponding changes to the library, to the tests, and to the documentation.

jgehrcke commented 10 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


I appreciate your input, Miguel and Alex. Sorry for taking so long to get back to you. I have reconsidered ignoring SIGPIPE in child processes and decided to do it. That is, the next release will contain this breaking change.

On the one hand, I still think that calling code should explicitly state how it desires to treat incoming signals. On the other hand, you are right that people in the (C)Python world tend to assume that SIGPIPE is ignored and that the underlying error is detected and reported by Python via exception (and traceback, if not handled).

While I initially found the first argument to have larger weight, I got convinced by your feedback as well as further resources such as http://stackoverflow.com/a/108192/145400 and http://www.microhowto.info/howto/ignore_sigpipe_without_affecting_other_threads_in_a_process.html. The latter summarized the issue quite nicely:

If you attempt to write to a broken pipe or a disconnected socket then the signal SIGPIPE will be raised, and its default behaviour is to terminate the process responsible. This behaviour is appropriate for programs that are intended for use in a pipeline, but in most other circumstances it is undesirable and should be prevented.

However, what I find most convincing in the context of gipc is that code should not behave differently depending on whether it is called from the initial process or from child processes. Currently, there is a deviation. This minimal example demonstrates the case where we write from the initial process to a broken pipe:

import gipc

def child(r):
    r.get()
    # Here the child process exits and the read pipe end
    # becomes closed implicitly.

with gipc.pipe() as (r, w):
    p = gipc.start_process(child, (r,))
    while True:
        w.put(0)

Its output:

$ python sigpipetest.py
Traceback (most recent call last):
  File "sigpipetest.py", line 11, in <module>
    w.put(0)
  File "/bla/gipc/gipc/gipc.py", line 758, in put
    self._write(struct.pack("!i", len(bindata)) + bindata)
  File "/home/jang/hobbyproggorn/gipc/gipc/gipc.py", line 726, in _write
    bytes_written = _write_nonblocking(self._fd, bindata)
  File "/bla/gipc/pyenv/local/lib/python2.7/site-packages/gevent/os.py", line 70, in nb_write
    return _write(fd, buf)
OSError: [Errno 32] Broken pipe

This code demonstrates the case where we write from a child process to a broken pipe:

import gipc

def child_read(r):
    r.get()
    # Here the child process exits and the read pipe end
    # becomes closed implicitly.

def child_write(w):
    while True:
        w.put(0)

with gipc.pipe() as (r, w):
    pr = gipc.start_process(child_read, (r,))
    pw = gipc.start_process(child_write, (w,))
    pr.join()
    pw.join()
    print "write process exit code: %s" % pw.exitcode

Its output:

$ python sigpipetestchild.py
write process exit code: -13

That is, no exception is raised and the writing child process exits silently. Its exit code indicates that it exited as of an unhandled SIGPIPE signal.

Now I am of the opinion that this difference in behavior is not tolerable and I will commit corresponding changes to gipc and add unit tests accordingly.

jgehrcke commented 10 years ago

Original comment by Alex Besogonov (Bitbucket: alex_besogonov, GitHub: Unknown):


Please, add SIGPIPE to the ignored list. The current behavior causes very hard-to-find bugs the first time one encounters the broken connections.

And since in most cases SIGPIPE is a rare event (requiring a client to break a connection), it's extremely hard to replicate it.

jgehrcke commented 10 years ago

Original comment by Miguel Turner (Bitbucket: dhagrow, GitHub: dhagrow):


For the record, I was having a problem with gipc child processes dying for no apparent reason, and came across this issue while trying to discover why. This issue turned out to be the cause of my problem, but it took a while to realize it because I couldn't see why my processes were dying (I wasn't checking the exit code).

While I agree that it's less magical to treat all signals the same, the effect of exiting the process on SIGPIPE will not be what most Python users expect. If it doesn't make sense to change behavior for this special case (do other Python implementations behave differently?), I would recommend adding documentation to clarify how gipc deviates from CPython behavior.

jgehrcke commented 10 years ago

Original comment by Heungsub Lee (Bitbucket: sublee, GitHub: sublee):


Yes, signal.signal(signal.SIGPIPE, signal.SIG_IGN) you suggested fixes the issue well. You are right, that code just has a cheap disadvantage. Now I agree to your decision.

Thanks for the explanation!

jgehrcke commented 10 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


Thanks for your input. I think the approach you suggested contains a bit too much magic, for my taste. But let's take this discussion step by step.

First of all, I need to know whether a call to signal.signal(signal.SIGPIPE, signal.SIG_IGN) in your child fixes the issue that you are observing.

Then, some more background: gipc should be as painless to use as possible for the casual user. The casual user should be able to use gevent.signal in the parent without worrying about a libev signal watcher to do unexpected things in the child. That is why we have to "normalize" signals in the child. This came up in issue 7 (#7/) and we decided -- as a general purpose solution -- to reset all signals.

Now, when we reset signals, we could implement special rules for special signals or treat all signals the same. The decision was to treat all signals the same, i.e. reset all of them, to SIG_DFL. The clear advantage is that this is well-defined behavior and a simple rule to keep in mind.

Now the important thing for me is to find out what disadvantages this solution brings along in your case. If the "disadvantage" just is that you need to put signal.signal(signal.SIGPIPE, signal.SIG_IGN) into your early child code, I would say that it is not really a disadvantage, but rather a helpful clarification of what your code really does. If there are other disadvantages, please tell me.

Looking forward to hearing your feedback!

Edit: Just to clarify, are you basically saying that people expect from Python to always ignore SIGPIPE and so we should just stick with that paradigm, yes? The corresponding documentation quote would be "SIGPIPE is ignored (so write errors on pipes and sockets can be reported as ordinary Python exceptions) and SIGINT is translated into a KeyboardInterrupt exception." -- what would you suggest to do with SIGINT in a gipc child (that's in fact a difficult question)?

jgehrcke commented 10 years ago

Original comment by Heungsub Lee (Bitbucket: sublee, GitHub: sublee):


Thanks to your answer and this awesome project.

By default, Python runtime sets SIG_IGN as the behavior of SIGPIPE and SIGXFSZ. See initsigs() at pythonrun.c. I think that it is better behavior instead of SIG_DFL because Python can handle these by IOError without interruption.

So I suggest that gipc caches the default signal handlers using signal.getsignal(...) at the load time and each sub process refers them when they reset the signal handlers.

For example,

#!python
_signals_to_reset = [getattr(signal, s) for s in
    set([s for s in dir(signal) if s.startswith("SIG")]) -
    set(['SIG_DFL', 'SIGSTOP', 'SIGKILL'])]
_signal_and_handlers_to_reset = {s: signal.getsignal(s) for s in _signals_to_reset}

def _reset_signal_handlers():
    for s, h in _signal_and_handlers_to_reset:
        signal.signal(s, h)

How you think?

jgehrcke commented 10 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


Please let me know if you think that gipc needs to change or if the proposed solution is convincing, thanks.

jgehrcke commented 10 years ago

Original comment by Jan-Philip Gehrcke (Bitbucket: jgehrcke, GitHub: jgehrcke):


It is 'expected' behavior, according to this warning in the docs:

Please note that in order to provide reliable signal handling in the
context of libev, the default disposition (action) is restored for all
signals in the child before executing the user-given target function.
You can (re)install any signal handler within target.

I believe it is good to have this simple rule applied.

There is a simple solution for you, although I really would not call this workaround. Just import signal and call

signal.signal(signal.SIGPIPE, signal.SIG_IGN)

in your child's target function. This approach has clear semantics. That code explicitly states how it behaves.