oscar-system / GAP.jl

GAP packages for Julia integration
GNU Lesser General Public License v3.0
55 stars 19 forks source link

Error handling incomplete / clashes with GAP's Test/TestDirectory, leaves GAP in bad state, subsequent crash #1011

Open fingolfin opened 1 week ago

fingolfin commented 1 week ago

Tried to run the GAP test suite in Julia. This failed because it includes testing errors and we hijack them (so the code in GAP's Test which also "hijacks" error handling has no chance of running - here the very limited error handling model in GAP bites us, unlike try/catch in Julia there is no proper nesting mechanism... perhaps we could at least deal with this particular use case in Test, but that's a separate question for the future).

Anyway, after this I thought "maybe it works from a GAP prompt", so I launched one... and it crashes. Presumably because a bunch of state is not reset when we yank control away from Test / RunTests / READ_STREAM_LOOP:

julia> @time GAP.Globals.TestDirectory(GapObj("/Users/mhorn/Projekte/GAP/gap.spielwiese/tst/testinstall"))
Architecture: aarch64-apple-darwin20-julia1.10-64-kv9

testing: /Users/mhorn/Projekte/GAP/gap.spielwiese/tst/testinstall/ConjNatSym.tst
      31 ms (0 ms GC) and 24.1MB allocated for ConjNatSym.tst
testing: /Users/mhorn/Projekte/GAP/gap.spielwiese/tst/testinstall/DirectProductElement.tst
# line 105 of 131 (80%)ERROR: Error thrown by GAP: Error, <index> too large for <dpelm>, you may return another index

Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] ThrowObserver(depth::Int32)
   @ GAP ~/Projekte/OSCAR/GAP.jl/src/GAP.jl:89

julia> GAP.prompt()

[34215] signal (11.2): Segmentation fault: 11
in expression starting at REPL[3]:1
putToTheStream at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
Pr at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
GetLine at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
PEEK_CURR_CHAR at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
NextSymbol at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
Match at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
ReadEvalCommand at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
FuncSHELL at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
ExecProccall6args at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
ExecSeqStat3 at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
EXEC_CURR_FUNC at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
DoExecFunc0args at /Users/mhorn/.julia/artifacts/2179238659fb71bb09503491ad5113d9c35daacc/lib/libgap.9.dylib (unknown line)
_call_gap_func at /Users/mhorn/Projekte/OSCAR/GAP.jl/src/ccalls.jl:323 [inlined]
GapObj at /Users/mhorn/Projekte/OSCAR/GAP.jl/src/ccalls.jl:306
unknown function (ip: 0x13b0542c3)
_jl_invoke at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XG3Q6T6R70.0/build/default-honeycrisp-XG3Q6T6R70-0/julialang/julia-release-1-dot-10/src/gf.c:0 [inlined]
ijl_apply_generic at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-XG3Q6T6R70.0/build/default-honeycrisp-XG3Q6T6R70-0/julialang/julia-release-1-dot-10/src/gf.c:3077
prompt at /Users/mhorn/Projekte/OSCAR/GAP.jl/src/prompt.jl:31
unknown function (ip: 0x13b0441e3)

Look at READ_STREAM_LOOP it does this among other things:

    TypInputFile input;
    if (!OpenInputStream(&input, instream, FALSE)) {
        return False;
    }

    TypOutputFile output;
    if (!OpenOutputStream(&output, outstream)) {
        res = CloseInput(&input);
        GAP_ASSERT(res);
        return False;
    }

Note that pointers to the input and output stack variables are being recorded inside GAP's printing system, in two global variables.

When we then take over the control flow via a longjmp and return to Julia, those global variables are never reset.

We could certainly reset them "manually" in our Julia error handler. But I worry what else might be left in a bad state... this certainly is among the worst kind, though.

ThomasBreuer commented 1 week ago

I get the same error when calling GAP.Globals.TestDirectory, but I cannot reproduce the immediate crash when I call GAP.prompt() afterwards. However, trying to compute something at the GAP prompt then yields a segmentation fault. And also calling `GAP.Globals.TestDirectory twice at the Julia prompt yields a reproducible crash.

ThomasBreuer commented 1 week ago

This problem should also affect the idea to run the JuliaInterface tests from Julia (see #1005), since quite a few GAP errors are tested there.

fingolfin commented 1 week ago

Indeed. But it also suggests that long-term, once this issue here is fixed, we probably want to also run those tests "directly in the current session" to verify the crash is gone.