Closed embray closed 6 years ago
The last test is a bit worse since it expects some output like "An error occurred during signal handling..." but gets no output.
The tests are expecting some extra ------- followed by some text that isn't being output.
That's the C backtrace. It seems that Cygwin doesn't support that, which is not so bad.
The last test is a bit worse since it expects some output like "An error occurred during signal handling..." but gets no output.
Honestly, if that is the only problem, this is working very well on Cygwin.
Just in case that it matters, could you test again with CFLAGS="-O0"
Could you run from cysignals.tests import *; test_bad_str()
directly in a Python shell and tell me what happens? If Python crashes, I would also like to see echo $?
.
The trivial test failures should be fixed in master.
Yes, I already could have told you it works well on Cygwin :) If it didn't we'd have had a lot more problems by now.
Indeed, I don't think printing the backtrace is fully supported on Cygwin.
The trivial test failures should be fixed in master.
Confirmed.
Hmm
$ python -c 'from cysignals.tests import *; test_bad_str()'
------------------------------------------------------------------------
Attaching gdb to process id 33864.
0 [main] python 34460 C:\cygwin64\bin\gdb.exe: *** fatal error - error while loading shared libraries: /usr/lib/libpython2.7.dll.a: cannot open shared object file: Exec format error
Failed to run gdb.
Failed to run gdb.
Install gdb for enhanced tracebacks.
------------------------------------------------------------------------
An error occurred during signal handling.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
Segmentation fault (core dumped)
$ echo $?
139
FWIW I've never gotten Python integration in gdb to work on Cygwin. I don't think there's any deep reason for it not work; I think it's just slightly broken and no one's bothered to fix it. I never looked at it too deeply (though it's annoying, since there are other areas where I'd have liked to have had it).
So the message An error occurred during signal handling.
does appear. Then why does the test fail?
I don't know. Maybe it's just a question of I/O stream it's going to? I'll take a closer look when I can.
I'll take a closer look when I can.
Please do. It looks like the tests should pass...
In any case, it's probably just an issue with the tests themselves. Should still be tracked down and fixed, but at least it doesn't seem to be pointing to a real problem.
Well I feel silly. When I tested manually above I just ran python
, so it was running the test in whatever version of cysignals I happened to have installed globally at the time (I was not using a virtualenv). When I run the test in the current version of cysignals it indeed produces no output.
What's worse, I unthinkingly deleted the old installed version, so I have no idea what version it was. I'll have to do a git bisect
until I can figure out where it broke (because clearly it did work at one point...)
When I run the test in the current version of cysignals it indeed produces no output.
Literally nothing happens? Python does not crash?
Yes, literally nothing. Python does not crash, and returns 0
. It's bizarre.
I worked out that the problem started with 822be07132715ae62ec84b9ed97de05b97364153
Cygwin does support sigaltstack
, but probably the problem has something to do with that...
Even stranger--when I ran the test under strace I see what we should expect to see, at least initially:
the forked process delivers a signal to the parent process (SIGILL
), and then it sets up the signal handler and gets an access violation somewhere in there.
Yes, literally nothing. Python does not crash, and returns 0.
You mean that the function test_bad_str()
returns int(0)
?
Cygwin does support sigaltstack, but probably the problem has something to do with that...
My experience is that sigaltstack
has different quirks on different OSes. It's not a widely used system call, so the chances are higher that it is buggy. Also, many details are not specified by POSIX so it behaves slightly different on every OS. You can see from the git history here that it took many iterations to get something that works on Linux, OS X and Solaris. And even then, there is a bug(?) on OS X which prevents sigaltstack
from working in a forked child process.
The function test_bad_str
induces a signal during the execution of the signal handler. It could very well be that this is something that Cygwin never tested in combination with sigaltstack
.
I just meant the python process has an exit code of 0, not the test function itself.
I'll look through the commit history, but if you have a chance a brief summary of the kinds of issues you encountered with this might be helpful.
It's strange that it appears to work partially correctly running under strace, in that the signal handler itself causes an access violation, whereas when run by itself there is no apparent error, so maybe it's that Cygwin's own exception handling gets broken. Fortunately, that's easy enough to investigate in Cygwin.
Running in gdb confirmed a couple things:
1) The initial SIGILL
generated by the test is handled correctly and the handler is running on the alternative stack.
2) An access violation is generated upon trying to access the bogus char *
. This latter access violation is not handled at all by Cygwin (Cygwin itself crashes).
Similar deal after switching to latest master branch. Confirmed that running on alt stack (specifically trampoline_stack
), then crashes. It appears Cygwin is not properly handling exceptions that occur while running on an alternate stack :/
This should be fixable, but perhaps in the meantime it would be best to disable this feature on Cygwin (and perhaps also on any platform that doesn't have sigaltstack()
in the first place, though I don't have any such platforms on hand at the moment). sigaltstack()
was only added to Cygwin sometime in early 2015, FWIW.
This should be fixable
Maybe not. After some more research it appears this might be a limitation of the NT kernel. It really doesn't like having %rsp
messed with in a manner it wasn't expecting. It might be very difficult to get it to recover from exceptions that occur while in an alternate stack.
An interesting post I found describing some details about the issue. Native Client has the same problem in that it's also running code on a stack that Windows things is bogus (and thus doesn't allow normal SEH handlers to run): http://lackingrhoticity.blogspot.fr/2012/09/native-clients-ntdll-patch-on-x86-64-windows.html
Anyways, in practice, the safest and most practical thing to do from cysignals for now is to disable use of sigaltstack on Cygwin, I think.
One last thought for now--it looks like I might still be able to do something by invoking the Windows API directly to register vectored continue handler. This is in fact how Cygwin itself handles exceptions that occur while on the alternate stack. Unfortunately Cygwin's handler only handles exceptions that occur from within Cygwin (which is the only context where it knows how to safely unwind the stack). Exceptions that come from outside Cygwin, as in this case, are just given up on and the process is allowed to run off into the weeds (actually Windows just kills it). But Cysignals provides an opportunity to still do something a little better here I think.
Yep, based on a quick experiment it looks like could work. It should suffice to print an error message (like cysignals_signal_handler currently does when an internal error occurs) and exit the process with the appropriate error code.
Running
make test
on Cygwin I get these failures:Seems pretty trivial though. The tests are expecting some extra
-------
followed by some text that isn't being output. I'm not sure why.