oscar-system / Oscar.jl

A comprehensive open source computer algebra system for computations in algebra, geometry, and number theory.
https://www.oscar-system.org
Other
358 stars 128 forks source link

Rare macOS segfault in BasisLieHighestWeight tests #4284

Open benlorenz opened 2 weeks ago

benlorenz commented 2 weeks ago

The relevant error is a few lines above:

      From worker 4:    [58185] signal (11.2): Segmentation fault: 11
      From worker 4:    in expression starting at /Users/aaruni/Desktop/oscar-runners/runner-2/_work/Oscar.jl/Oscar.jl/experimental/BasisLieHighestWeight/test/MainAlgorithm-test.jl:34
      From worker 4:    _ZNK10__cxxabiv120__si_class_type_info12__do_dyncastElNS_17__class_type_info10__sub_kindEPKS1_PKvS4_S6_RNS1_16__dyncast_resultE at /Users/aaruni/Desktop/oscar-runners/runner-2/_work/_tool/julia/1.10.6/aarch64/lib/julia/libstdc++.6.dylib (unknown line)
      From worker 4:    _ZNK10__cxxabiv117__class_type_info9can_catchEPKNS_16__shim_type_infoERPv at /usr/lib/libc++abi.dylib (unknown line)

This looks like some C++ exception handling issue, maybe related to the mixing of different C++ libraries (libstdc++ / libc++). I have seen this a few times but only on the macos RPTU runner. It usually disappears when re-running the job.

Originally posted by @benlorenz in https://github.com/oscar-system/Oscar.jl/issues/4281#issuecomment-2461746980

Edit: The unmangled symbols are:

__cxxabiv1::__si_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const
__cxxabiv1::__class_type_info::can_catch(__cxxabiv1::__shim_type_info const*, void*&) const
lgoettgens commented 2 weeks ago

Do you have more of a backtrace, so we know where in the julia code this error happens?

I skimmed through all the julia code in BasisLieHighestWeight, and apart from a lot of GAP, this only seems to call into polymake (via CxxWrap) in https://github.com/oscar-system/Oscar.jl/blob/1217ab321f1f83588c0ea6c792ab163f6c801e38/experimental/BasisLieHighestWeight/src/WeylPolytope.jl#L98. Maybe this helps to track the issue down

benlorenz commented 2 weeks ago

The backtrace is just these few lines, see here: https://github.com/oscar-system/Oscar.jl/actions/runs/11713073053/job/32625109869?pr=4281#step:10:1625 (It is not unusual that the macos backtraces are broken...)