sagemath / cysignals

cysignals: interrupt and signal handling for Cython. Source repository for https://pypi.org/project/cysignals/
GNU Lesser General Public License v3.0
43 stars 24 forks source link

Segmentation fault #121

Open tobiasdiez opened 3 years ago

tobiasdiez commented 3 years ago

After installing cysignals 1.10.2 via pip (on Ubuntu 20.10), calling sig_on() results in Segmentation fault error without giving any further details.

Traceback (most recent call last):
      File "/__w/sage/sage/src/sage/doctest/forker.py", line 720, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/__w/sage/sage/src/sage/doctest/forker.py", line 1145, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.homology.simplicial_set_constructions.SubSimplicialSet.ambient_space[3]>", line 1, in <module>
        eight.fundamental_group()
      File "/__w/sage/sage/src/sage/categories/simplicial_sets.py", line 348, in fundamental_group
        FG = FreeGroup(len(gens), 'e')
      File "/__w/sage/sage/src/sage/groups/free_group.py", line 683, in FreeGroup
        return FreeGroup_class(names)
      File "sage/misc/classcall_metaclass.pyx", line 322, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__
        return cls.classcall(cls, *args, **kwds)
      File "sage/misc/cachefunc.pyx", line 1001, in sage.misc.cachefunc.CachedFunction.__call__
        w = self.f(*args, **kwds)
      File "/__w/sage/sage/src/sage/structure/unique_representation.py", line 1008, in __classcall__
        instance = typecall(cls, *args, **options)
      File "sage/misc/classcall_metaclass.pyx", line 486, in sage.misc.classcall_metaclass.typecall
        return (<PyTypeObject*>type).tp_call(cls, args, kwds)
      File "/__w/sage/sage/src/sage/groups/free_group.py", line 772, in __init__
        libgap_free_group = libgap.FreeGroup(generator_names)
      File "sage/libs/gap/libgap.pyx", line 700, in sage.libs.gap.libgap.Gap.__getattr__
        g = self.eval(name)
      File "sage/libs/gap/libgap.pyx", line 399, in sage.libs.gap.libgap.Gap.eval
        initialize()
      File "sage/libs/gap/util.pyx", line 289, in sage.libs.gap.util.initialize
        sig_on()
    cysignals.signals.SignalError: Segmentation fault

See https://github.com/tobiasdiez/sage/runs/1411148176?check_suite_focus=true. How to further debug/fix this issue?

tobiasdiez commented 3 years ago

After a bit of playing around with it, it turned out that the code wrapped in sig_on / off was throwing another (unrelated) exception. Once this exception was fixed, the Segmentation fault error disappeared.

So I would suggest at least a reformulation of the error message, pointing out that the reason could be an unhandled exception.

embray commented 3 years ago

@tobiasdiez I'm not sure what you mean. In this case cysignals is working as designed: some code in sage.libs.gap.util.initialize caused a segfault (SIGSEGV) which was handled by cysignals and raised as a Python exception.

tobiasdiez commented 3 years ago

I think, I also realized this after reporting. The thing is that the stacktrace as well as the exception message are not giving any hints that this exception is coming from some other code that is executed after sig_on. It really looks like sig_on is throwing a Segmentation fault error because of some internal issues.

embray commented 3 years ago

This is just how cysignals and cython work. When you use sig_on() in Cython code it generates C that looks something like:

__pyx_t_1 = sig_on(); if (unlikely(__pyx_t_1 == ((int)0))) __PYX_ERR(0, 99, __pyx_L4_error)

Here __PYX_ERR is a macro that does a goto to the exception raising / cleanup part of the function, and the 99 refers to the fact that this line corresponds to line 99 of the .pyx file.

Upon first call, sig_on() should always return 1. If a signal is raised in the code after sig_on(), cysignals' signal handler will set the appropriate Python exception, then do a longjmp back to the sig_on() (these details are all buried in the definition of the macro). The way longjmp works means (very roughly) that the program is restored to just before the point where sig_on() returned, but has it return 0 instead, thus triggering Cython's error handling.

So from Cython's perspective--and particularly its code for generating a stack frame for the trackback, it thinks line 99 is where the exception occurred.

I don't know there's much that can be done about this, and it's a good case for why there should be relatively few lines of C code between a sig_on() and sig_off() if possible.

It is possible to have cysignals generate C backtrace during signal handling, but cysignals has to be compiled with the ENABLE_DEBUG_CYSIGNALS macro set. Maybe an argument could be made for optionally enabling backtrace generation and attaching part of the C backtrace to the Python exception. That could be useful for pinpointing, from the Python level, where the signal occurred (at least this might be possible in some cases). Normally the debug code is omitted entirely for performance reasons (running significantly complex code such as this from within signal handlers is by default a bad idea).

But I wonder if it wouldn't be too harmful if the functionality could be available without ENABLE_DEBUG_CYSIGNALS, and just enabled optionally via a flag. Then the exception message could say something like "for more details set cysignals.enable_exception_backtrace = True and run the same code again".

More likely, it might be good to just clarify this somewhat in the documentation.

tobiasdiez commented 3 years ago

Thanks for the explanation. Maybe it's already enough to add something like "the following error occurred between sig_on / sig_off" in the error message?