Test failures on Cygwin

embray commented 6 years ago

Running make test on Cygwin I get these failures:

**********************************************************************
File "src/cysignals/tests.pyx", line 533, in tests.pyx
Failed example:
    print(Popen([executable, '-c', cmd], stdout=PIPE, stderr=PIPE).communicate()[1].decode("utf-8"))
Expected:
    ------------------------------------------------------------------------
    ...
    ------------------------------------------------------------------------
    <BLANKLINE>
Got:
    ------------------------------------------------------------------------
    <BLANKLINE>
**********************************************************************
File "src/cysignals/tests.pyx", line 581, in tests.pyx
Failed example:
    print(msg.decode("utf-8"))
Expected:
    ------------------------------------------------------------------------
    ...
    ------------------------------------------------------------------------
    Unhandled SIG...
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------
    <BLANKLINE>
Got:
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred.
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------
    <BLANKLINE>
**********************************************************************
File "src/cysignals/tests.pyx", line 632, in tests.pyx
Failed example:
    print(Popen([executable, '-c', cmd], stdout=PIPE, stderr=PIPE).communicate()[1].decode("utf-8"))
Expected:
    ------------------------------------------------------------------------
    ...
    ------------------------------------------------------------------------
    Unhandled SIGABRT: An abort() occurred.
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------
    <BLANKLINE>
Got:
    ------------------------------------------------------------------------
    Unhandled SIGABRT: An abort() occurred.
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------
    <BLANKLINE>
**********************************************************************
File "src/cysignals/tests.pyx", line 672, in tests.pyx
Failed example:
    print(Popen([executable, '-c', cmd], stdout=PIPE, stderr=PIPE).communicate()[1].decode("utf-8"))
Expected:
    ------------------------------------------------------------------------
    ...
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred.
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------
    <BLANKLINE>
Got:
    ------------------------------------------------------------------------
    Unhandled SIGSEGV: A segmentation fault occurred.
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------
    <BLANKLINE>
**********************************************************************
File "src/cysignals/tests.pyx", line 697, in tests.pyx
Failed example:
    print(Popen([executable, '-c', cmd], stdout=PIPE, stderr=PIPE).communicate()[1].decode("utf-8"))
Expected:
    ------------------------------------------------------------------------
    ...
    ------------------------------------------------------------------------
    An error occurred during signal handling.
    This probably occurred because a *compiled* module has a bug
    in it and is not properly wrapped with sig_on(), sig_off().
    Python will now terminate.
    ------------------------------------------------------------------------
    <BLANKLINE>
Got:
    <BLANKLINE>
**********************************************************************
1 items had failures:
   5 of 120 in tests.pyx
***Test Failed*** 5 failures.

Seems pretty trivial though. The tests are expecting some extra ------- followed by some text that isn't being output. I'm not sure why.

embray commented 6 years ago

The last test is a bit worse since it expects some output like "An error occurred during signal handling..." but gets no output.

jdemeyer commented 6 years ago

The tests are expecting some extra ------- followed by some text that isn't being output.

That's the C backtrace. It seems that Cygwin doesn't support that, which is not so bad.

jdemeyer commented 6 years ago

The last test is a bit worse since it expects some output like "An error occurred during signal handling..." but gets no output.

Honestly, if that is the only problem, this is working very well on Cygwin.

jdemeyer commented 6 years ago

Just in case that it matters, could you test again with CFLAGS="-O0"

jdemeyer commented 6 years ago

Could you run from cysignals.tests import *; test_bad_str() directly in a Python shell and tell me what happens? If Python crashes, I would also like to see echo $?.

jdemeyer commented 6 years ago

The trivial test failures should be fixed in master.

embray commented 6 years ago

Yes, I already could have told you it works well on Cygwin :) If it didn't we'd have had a lot more problems by now.

Indeed, I don't think printing the backtrace is fully supported on Cygwin.

embray commented 6 years ago

The trivial test failures should be fixed in master.

Confirmed.

embray commented 6 years ago

Hmm

$ python -c 'from cysignals.tests import *; test_bad_str()'
------------------------------------------------------------------------
Attaching gdb to process id 33864.
      0 [main] python 34460 C:\cygwin64\bin\gdb.exe: *** fatal error - error while loading shared libraries: /usr/lib/libpython2.7.dll.a: cannot open shared object file: Exec format error

Failed to run gdb.
Failed to run gdb.
Install gdb for enhanced tracebacks.
------------------------------------------------------------------------
An error occurred during signal handling.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
Segmentation fault (core dumped)

$ echo $?
139

embray commented 6 years ago

FWIW I've never gotten Python integration in gdb to work on Cygwin. I don't think there's any deep reason for it not work; I think it's just slightly broken and no one's bothered to fix it. I never looked at it too deeply (though it's annoying, since there are other areas where I'd have liked to have had it).

jdemeyer commented 6 years ago

So the message An error occurred during signal handling. does appear. Then why does the test fail?

embray commented 6 years ago

I don't know. Maybe it's just a question of I/O stream it's going to? I'll take a closer look when I can.

jdemeyer commented 6 years ago

I'll take a closer look when I can.

Please do. It looks like the tests should pass...

embray commented 6 years ago

In any case, it's probably just an issue with the tests themselves. Should still be tracked down and fixed, but at least it doesn't seem to be pointing to a real problem.

embray commented 6 years ago

Well I feel silly. When I tested manually above I just ran python, so it was running the test in whatever version of cysignals I happened to have installed globally at the time (I was not using a virtualenv). When I run the test in the current version of cysignals it indeed produces no output.

What's worse, I unthinkingly deleted the old installed version, so I have no idea what version it was. I'll have to do a git bisect until I can figure out where it broke (because clearly it did work at one point...)

jdemeyer commented 6 years ago

When I run the test in the current version of cysignals it indeed produces no output.

Literally nothing happens? Python does not crash?

embray commented 6 years ago

Yes, literally nothing. Python does not crash, and returns 0. It's bizarre.

I worked out that the problem started with 822be07132715ae62ec84b9ed97de05b97364153 Cygwin does support sigaltstack, but probably the problem has something to do with that...

embray commented 6 years ago

Even stranger--when I ran the test under strace I see what we should expect to see, at least initially: the forked process delivers a signal to the parent process (SIGILL), and then it sets up the signal handler and gets an access violation somewhere in there.

jdemeyer commented 6 years ago

Yes, literally nothing. Python does not crash, and returns 0.

You mean that the function test_bad_str() returns int(0)?

Cygwin does support sigaltstack, but probably the problem has something to do with that...

My experience is that sigaltstack has different quirks on different OSes. It's not a widely used system call, so the chances are higher that it is buggy. Also, many details are not specified by POSIX so it behaves slightly different on every OS. You can see from the git history here that it took many iterations to get something that works on Linux, OS X and Solaris. And even then, there is a bug(?) on OS X which prevents sigaltstack from working in a forked child process.

The function test_bad_str induces a signal during the execution of the signal handler. It could very well be that this is something that Cygwin never tested in combination with sigaltstack.

embray commented 6 years ago

I just meant the python process has an exit code of 0, not the test function itself.

I'll look through the commit history, but if you have a chance a brief summary of the kinds of issues you encountered with this might be helpful.

It's strange that it appears to work partially correctly running under strace, in that the signal handler itself causes an access violation, whereas when run by itself there is no apparent error, so maybe it's that Cygwin's own exception handling gets broken. Fortunately, that's easy enough to investigate in Cygwin.

embray commented 6 years ago

Running in gdb confirmed a couple things: 1) The initial SIGILL generated by the test is handled correctly and the handler is running on the alternative stack. 2) An access violation is generated upon trying to access the bogus char *. This latter access violation is not handled at all by Cygwin (Cygwin itself crashes).

embray commented 6 years ago

Similar deal after switching to latest master branch. Confirmed that running on alt stack (specifically trampoline_stack), then crashes. It appears Cygwin is not properly handling exceptions that occur while running on an alternate stack :/

This should be fixable, but perhaps in the meantime it would be best to disable this feature on Cygwin (and perhaps also on any platform that doesn't have sigaltstack() in the first place, though I don't have any such platforms on hand at the moment). sigaltstack() was only added to Cygwin sometime in early 2015, FWIW.

embray commented 6 years ago

This should be fixable

Maybe not. After some more research it appears this might be a limitation of the NT kernel. It really doesn't like having %rsp messed with in a manner it wasn't expecting. It might be very difficult to get it to recover from exceptions that occur while in an alternate stack.

embray commented 6 years ago

An interesting post I found describing some details about the issue. Native Client has the same problem in that it's also running code on a stack that Windows things is bogus (and thus doesn't allow normal SEH handlers to run): http://lackingrhoticity.blogspot.fr/2012/09/native-clients-ntdll-patch-on-x86-64-windows.html

embray commented 6 years ago

Anyways, in practice, the safest and most practical thing to do from cysignals for now is to disable use of sigaltstack on Cygwin, I think.

embray commented 6 years ago

One last thought for now--it looks like I might still be able to do something by invoking the Windows API directly to register vectored continue handler. This is in fact how Cygwin itself handles exceptions that occur while on the alternate stack. Unfortunately Cygwin's handler only handles exceptions that occur from within Cygwin (which is the only context where it knows how to safely unwind the stack). Exceptions that come from outside Cygwin, as in this case, are just given up on and the process is allowed to run off into the weeds (actually Windows just kills it). But Cysignals provides an opportunity to still do something a little better here I think.

embray commented 6 years ago

Yep, based on a quick experiment it looks like could work. It should suffice to print an error message (like cysignals_signal_handler currently does when an internal error occurs) and exit the process with the appropriate error code.

sagemath / cysignals

Test failures on Cygwin #77