sagemath / cysignals

cysignals: interrupt and signal handling for Cython. Source repository for https://pypi.org/project/cysignals/
GNU Lesser General Public License v3.0
43 stars 23 forks source link

Better handling on Cygwin for exceptions that occur during exception handling on an alt stack #83

Closed embray closed 6 years ago

embray commented 6 years ago

This is a rough idea for how to fix #77. Actually it seems to work pretty reliably but I admit I haven't fully worked out the implications for continuing to do anything with Cygwin in this environment. It's probably pretty unsafe, but this is still better than the current situation (where the OS just quietly terminates the process, and never even sets the exit code in a way that Cygwin can keep track of).

jdemeyer commented 6 years ago

It would be interesting to know what happens when you don't call setup_alt_stack at all. For all we know, that might just work.

embray commented 6 years ago

For all we know, that might just work.

No, it won't. Cygwin's behavior here is in line with Linux--if a stack overflow occurs it actually disables any non-default SIGSEGV handler unless we have an alternative stack.

embray commented 6 years ago

Going to see if I can get this cleaned up. After re-reading this, it probably does make sense to still have a separate implementation_cygwin.{c,h} since there's enough code in it that it would be a lot of noise to include in the main implementation.c.

embray commented 6 years ago

Finally got around to updating this. I think it will be a bit better now.

We still need to add CI for Cygwin, and I can do that. For now you'll have to I guess take my word for it that this works.

embray commented 6 years ago

I moved the call to sig_reset_defaults inside sigdie_for_sig--I agree it doesn't make sense to not call them together. But they still sort of make sense to exist as separate functions.

jdemeyer commented 6 years ago

The __version__ stuff that you added is breaking on Python 2.7 according to Travis CI.

embray commented 6 years ago

Oops, I don't think I meant to add that stuff here at all. I think they were just uncommitted changes that I accidentally git ci -a'd

embray commented 6 years ago

Ugh. It might be unrelated to this, but now I'm getting a hang when I run make check that didn't happen before I rebased on current master.

embray commented 6 years ago

Nevermind, I think might be user error. However, now I'm getting a new failure:

File "src/cysignals/signals.pyx", line 181, in signals.pyx
Failed example:
    init_cysignals()
Expected:
    <cyfunction python_check_interrupt at ...>
Got:
    <built-in function python_check_interrupt>
embray commented 6 years ago

Still user/environment error. But now I'm back to the tests hanging in src/cysignals/tests.pyx. I'm going to try going back to the master branch instead of mine and see if it's still a problem.

embray commented 6 years ago

This is weird: make check currently runs make check-prefix and then make check-user. If I, starting from a clean repository, run both of those targets in order, manually, everything works. However, if I just run make check, again from a clean repo, it hangs in the check-user-doctest target on:

export PYTHONUSERBASE="`pwd`/tmp/user" && ulimit 2>/dev/null -s 1024; python -B rundoctests.py src/cysignals/*.pyx
Doctesting 5 files.
src/cysignals/alarm.pyx
src/cysignals/pselect.pyx
src/cysignals/pysignals.pyx
src/cysignals/signals.pyx
src/cysignals/tests.pyx

I think some of these makefile rules really need to be cleaned up in general, but it's disconcerting that something about how the makefile is written can actually cause the tests to hang. So I need to keep looking into it. This behavior seems to be deterministic though.

embray commented 6 years ago

Alright. After more extensive testing over lunch, it looks like this might not be entirely deterministic after all, and maybe it has nothing to do with the different make targets as my initial testing suggested. So now I just need to find out exactly what test it's hanging on...

embray commented 6 years ago

Okay. When it does hang it seems to be happening in test_interrupt_bomb. This is hard to reproduce, but I suspect that's what happening is that the "Some time later" is too short for Cygwin, considering the additional overhead of signal handling there, and the SIGABRT is getting lost in the shuffle. I don't know that that should happen, and it might indicate a bug in Cygwin. But I don't think there's anything to fix here other than to make the test a little more robust somehow (i.e. ensure it can't block forever).

embray commented 6 years ago

Anyways, I don't think the hang on this test is related to this issue otherwise.

jdemeyer commented 6 years ago

When it does hang it seems to be happening in test_interrupt_bomb.

Interesting that you get a hang, that indicated that the SIGABRT is completely ignored. That seems to me like the least likely outcome (I could understand a hard crash).

I would be interested to see the output of this test when cysignals was configured with ./configure --enable-debug.

Does it help if you increase the delay in

signal_after_delay(SIGABRT, base_delay + 10*n + 1000)
jdemeyer commented 6 years ago

Note that the test_interrupt_bomb is really about the interaction between the many SIGINTs. The SIGABRT is just meant to terminate the loop, it's not really considered part of the test. So there might be a genuine bug here that the SIGINT interacts with the SIGABRT.

jdemeyer commented 6 years ago

Reading the cysignals code, it seems that the case of two different signals arriving together isn't really supported well. So if you get a SIGABRT while handling a SIGINT, the SIGABRT might be swallowed.

This might be fixable, but it's not completely obvious because there are many race conditions to think about. Even then, the SIGABRT would occur outside of the sig_on and would just crash the test.

So I should probably change the SIGABRT mechanism in this test to some delay.

embray commented 6 years ago

Reading the cysignals code, it seems that the case of two different signals arriving together isn't really supported well. So if you get a SIGABRT while handling a SIGINT, the SIGABRT might be swallowed.

Yes, that's what I was implying in this comment. It's entirely possible for the SIGABRT to just never be seen by cysignals under such heavy bombardment.

But anyways this discussion might be better continued elsewhere since it's not actually pertinent to this ticket (it just came up here because I didn't start seeing this failure until I most recently tested this ticket).

embray commented 6 years ago

Thanks!