python / cpython

The Python programming language
https://www.python.org
Other
63.41k stars 30.37k forks source link

siginterrupt with flag=False is reset when signal received #52601

Closed 0224cb86-9f80-4e63-870f-c42996e7e6ba closed 14 years ago

0224cb86-9f80-4e63-870f-c42996e7e6ba commented 14 years ago
BPO 8354
Nosy @bitdancer, @bz2
Files
  • sig-test.py: Sample program to demonstrate loss of siginterrupt flag
  • test_signal_siginterrupt.diff: Patch to add check to test_signal
  • signal_noreinstall.diff: Patch to remove signal reinstall from signal_handler when sigaction is used
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'library'] title = 'siginterrupt with flag=False is reset when signal received' updated_at = user = 'https://bugs.python.org/spiv' ``` bugs.python.org fields: ```python activity = actor = 'vila' assignee = 'none' closed = True closed_date = closer = 'exarkun' components = ['Library (Lib)'] creation = creator = 'spiv' dependencies = [] files = ['16837', '16847', '16848'] hgrepos = [] issue_num = 8354 keywords = ['patch', 'needs review'] message_count = 16.0 messages = ['102688', '102725', '102742', '103204', '103205', '103207', '103208', '103209', '103212', '103213', '103214', '103243', '103276', '105136', '105185', '105369'] nosy_count = 6.0 nosy_names = ['spiv', 'exarkun', 'vila', 'r.david.murray', 'gz', 'neologix'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'patch review' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue8354' versions = ['Python 2.6', 'Python 3.1', 'Python 2.7'] ```

    0224cb86-9f80-4e63-870f-c42996e7e6ba commented 14 years ago

    The effect of signal.siginterrupt(somesig, False) is reset the first time a that signal is received. This is not the documented behaviour, and I do not think this is a desireable behaviour. It renders siginterrupt effectively useless at providing the robustness against EINTR it is intended to provide.

    Attached is a fairly simple program to show this using SIGWINCH: run it in a resizeable terminal, and resize it twice. Notice that on the second terminal resize (i.e. the second SIGWINCH signal) the program crashes with an EINTR from the os.read.

    A partial workaround for the problem is to call signal.siginterrupt(somesig, False) again inside your signal handler, but it's very fragile. It depends on Python getting a chance to run the Python function registered by the signal.signal call, but this is not guaranteed. If there's frequent IO, that workaround might suffice. For the sig-test.py example attached to this bug, it doesn't (try it).

    The cause seems to be that signal_handler in signalmodule.c unconditionally does PyOS_setsig(sig_num, signal_handler) [except for SIGCHLD], which unconditionally invokes siginterrupt(sig, 1). A possible fix would be to add a 'int siginterrupt_flag;' to the Handlers array, and arrange for that value to be passed instead of the hard-coded 1. Another might be to not call PyOS_setsig from signal_handler at all -- I'm not sure why it is trying to reinstall itself, but perhaps there's some issue there I'm not aware of.

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 14 years ago

    The effect of signal.siginterrupt(somesig, False) is reset the first time a that signal is received. This is not the documented behaviour, and I do not think this is a desireable behaviour. It renders siginterrupt effectively useless at providing the robustness against EINTR it is intended to provide.

    Actually, siginterrupt shouldn't be used. The proper way is to use sigaction with SA_RESTART flag (and still, don't rely on SA_RESTART too much, certain syscalls are non restartable and this isn't realy portable).

    Another might be to not call PyOS_setsig from signal_handler at all -- I'm not sure why it is trying to reinstall itself, but perhaps there's some issue there I'm not aware of.

    Because signal.signal might be implemented with sigaction() or signal() and the latter resets the default signal handler when the handler is called. This means that if your system doesn't support sigaction and and you don't reinstall it, then the handler will only get called the first time. However, reinstalling the signal handler contains a race, because if a second signal comes before you reinstall it, it's handled by the default handler. That's why sigaction is much better (and calling PyOS_setsig unecessary when sigaction is available).

    The problem you describe can happen with both sigaction and signal :

    sigaction:

    signal:

    So the simple fix when sigaction is available is simply to not call PyOS_setsig() from signal_handler. When sigaction is not available, well, you have to recall that you want restartable syscalls, and call siginterrupt again with that value. But I think if the OS doesn't support sigaction, there's little chance it'll support siginterrupt. (1) I just found out that Windows doesn't have sigaction, but I don't know Window much, so if someone could confirm that it doesn't support siginterrupt, then the fix would simply be to not reinstall handler when sigaction is available.

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 14 years ago

    Attached are two patches:

    after:
    $ ./python Lib/test/regrtest.py test_signal
    test_signal
    test test_signal failed -- Traceback (most recent call last):
      File "/home/cf/python/trunk/Lib/test/test_signal.py", line 299, in test_siginterrupt_off
        self.assertEquals(i, False)
    AssertionError: True != False

    1 test failed: test_signal

    with patch and updated test:
    $ ./python Lib/test/regrtest.py test_signal
    test_signal
    1 test OK.

    Of course, this also corrects the problem with sig-test.py, the terminal can be resized indefinitely. I also passed test_subprocess and test_socketserver just to be sure, but reviews are more than welcome.

    0224cb86-9f80-4e63-870f-c42996e7e6ba commented 14 years ago

    Your patches look good to me.

    (They don't fix platforms without sigaction, but as you say they probably don't have siginterrupt, and even if they do they will still have an unfixable race.)

    What's the next step? I can't see an button to add this issue to the "Show Needing Review" queue.

    bitdancer commented 14 years ago

    Will the modified test fail on platforms that don't define HAVE_SIGACTION?

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 14 years ago

    Will the modified test fail on platforms that don't define HAVE_SIGACTION?

    Well, in theory, if the system has siginterrupt but not sigaction, it will fail. But as said, I don't think it's possible, see man siginterrupt: " This library routine uses an extension of the sigaction(2) system call that is not available in 4.2BSD, hence it should not be used if backward compatibility is needed."

    and the test right now has this at the beginning: if sys.platform[:3] in ('win', 'os2') or sys.platform == 'riscos': raise unittest.SkipTest("Can't test signal on %s" % \ sys.platform)

    So it pretty much assumes that any Unix system has siginterrupt, hence sigaction, so it should be safe. So I'd say in theory, it will, but I don't think that siginterrupt can be available without sigaction anyway. So feel free to modify the test, or not ;-)

    0224cb86-9f80-4e63-870f-c42996e7e6ba commented 14 years ago

    Are there any platforms that define HAVE_SIGINTERRUPT but that do not define HAVE_SIGACTION? If there are, then yes I expect they would fail that test.

    It would be a shame to delay this fix just because it doesn't fix all platforms... is it possible to mark that test as a known failure on a platform with siginterrupt but without sigaction?

    My initial comment outlined a change that would work for all platforms, but would be more complex. (I think the signal_noreinstall.diff change would be good to make even with the more complex approach, though.)

    0224cb86-9f80-4e63-870f-c42996e7e6ba commented 14 years ago

    FWIW, comparing the "change history" sections of \http://www.opengroup.org/onlinepubs/000095399/functions/siginterrupt.html\ and \http://www.opengroup.org/onlinepubs/000095399/functions/sigaction.html\ suggests that sigaction predates siginterrupt, but it's a wild and varied world out there.

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 14 years ago

    Well, I just think that the probability of having siginterrupt without sigaction is far less than having a Unix system without siginterrupt (which the current test_signal assumes). Or just drop the patch for the test, it honestly doesn't bother me ;-)

    bitdancer commented 14 years ago

    Yes, my question was directed at finding out if there were any platforms on which we'd have to add an additional skip (which would mean refactoring that test into two tests). But if the buildbots are all happy after it is applied we can probably choose to not worry about it until/if we get a bug report.

    8726d1eb-a365-45b6-b81d-c75988975e5a commented 14 years ago

    Will the modified test fail on platforms that don't define HAVE_SIGACTION?

    Only if they also have siginterrupt, which seems unlikely (as neologix explained). The implemented behavior on such platforms is unmodified from current trunk, while the test requires new behavior.

    I think it's probably worth testing this new behavior separately from the test for the existing behavior anyway, for the sake of clarity. If it turns out there's a platform with siginterrupt but not sigaction, then that'll also let the test be skipped there, which is nice (though this case seems unlikely to come up).

    I'm not exactly sure how we will know if it is expected to fail, though. I don't think HAVE_SIGACTION is exposed nicely to Python right now. Perhaps the signal module should make this information available (it's somewhat important now, as it will mean the difference between a nicely working signal.siginterrupt and a not-so-nicely working one).

    This quirk probably also bears a mention in the siginterrupt documentation.

    A few other thoughts on the code nearby:

    Aside from those points, the fix seems correct and good.

    79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 14 years ago
    • There seems to be no good reason to special case SIGCHLD in signal_handler. The comment about infinite recursion has no obvious interpretation to me. Fortunately, this is irrelevant on platforms with sigaction, because the handler remains installed anyway!

    I think that it's possible, on certain systems, that setting a signal handler for SIGCHLD while there is un unwaited child (zombie) process triggers a new sending of SIGCHLD. See http://standards.ieee.org/reading/ieee/interp/1003-1-90_int/pasc-1003.1-132.html: "The SIGCHLD or SIGCLD were used for implementing job control. Since I had implemented job control for both BSD and System V, I pointed out to the standards group that except for SIG_DFL, these signals had different semantics.

    If a signal handler was set for SIGCLD, then a signal would be generated if there were any unreaped child processes. When the signal handler was caught in System V, it was reset by default to SIG_DFL. However, if a process did not reap a child and instead reestablished the signal handler, it would go into an infinite loop since the signal would be generated again. The SIGCLD SIG_IGN behavior was that the system reaped the child when it completed so that the application didn't have to deal with it. However, I believe that a process blocked in wait() would be awakened, but I am not certain of this.

    The SIGCHLD signal on the other hand was generated when a child completed if a signal handler was set at that time. No signal would be generated if a signal handler was established while there was waiting children. The SIGCHLD signal was also generated when a child process stopped. I believe that BSD treated SIGCHLD SIG_IGN the same way that it treated SIGCHLD SIG_DFL."

    So, there might exist out there a couple systems that merrily mix SIGCLD and SIGCHLD, and re-raise a SIGCHLD when setting a new handler for it when unwaited children exist (which is the case in signal_handle, since child processes can't be waited for before the main thread gets to execute the handler it set up).

    0224cb86-9f80-4e63-870f-c42996e7e6ba commented 14 years ago

    I'm not exactly sure how we will know if it is expected to fail, though. I don't think HAVE_SIGACTION is exposed nicely to Python right now.

    It might be useful to have the contents of pyconfig.h exposed as a dict somehow. Maybe call it sys._pyconfig_h?

    A less ambitious change would be to expose just HAVE_SIGACTION as e.g. signal._have_sigaction.

    I agree with r.david.murray that we probably don't need to bother unless a buildbot or someone reports that this test fails.

    084a80ff-d352-46d1-9c66-c325a55e4c28 commented 14 years ago

    This patch has been reviewed by both Andrew and myself, it would be nice if someone made the time to land it. The test change is unlikely to break anything, and hey, that's what buildbots are for.

    8726d1eb-a365-45b6-b81d-c75988975e5a commented 14 years ago

    I agree that this should be landed (for 2.6 and 2.7). I think I can do it. I made some changes to the tests, though. It would be nice for someone to look those over and make sure the change still looks good.

    I checked everything in to the siginterrupt-reset-issue8354 branch. You can also find the diff at http://codereview.appspot.com/1134042/show

    8726d1eb-a365-45b6-b81d-c75988975e5a commented 14 years ago

    Should be resolved in, oh, let's see, r81007, r81011, r81016, and r81018. Thanks to everyone who helped out.