Open hahnjo opened 3 years ago
This reminds me of https://sft.its.cern.ch/jira/browse/ROOT-8544 and https://sft.its.cern.ch/jira/browse/ROOT-8523 which is the end where fixed by a7b0b3e647409c7510b38198b08ff94fd079f857.
Most likely the way the interpreter sets up the stack frames does not match the expectation of the exception handler (usually implemented in (g)libc).
It is very plausible to be a problem similar to the one leading to the "can not reallocate code" errors. So I see two plausible path forward: (a) fix the reallocate code error and hopefully it also fix this (b) installed a debug version of (g)libc and trace/debug the exception handlers handling of this case ...
Ugh, this one might become tricky, even LLVM upstream is unable to handle exceptions during JIT: I tested the most basic
int main() {
try {
throw 1;
} catch (...) { }
}
compiled with ./bin/clang++ -S -emit-llvm throw.cc
and interpreted using ./bin/lli throw.ll
, resulting in
libc++abi.dylib: terminating with uncaught exception of type int
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0. Program arguments: ./bin/lli throw.ll
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 lli 0x000000010104824c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 56
1 lli 0x00000001010471b4 llvm::sys::RunSignalHandlers() + 128
2 lli 0x00000001010488c4 SignalHandler(int) + 304
3 libsystem_platform.dylib 0x00000001826b1c44 _sigtramp + 56
4 libsystem_pthread.dylib 0x0000000182669c24 pthread_kill + 292
5 libsystem_c.dylib 0x00000001825b1864 abort + 104
6 libc++abi.dylib 0x0000000182629cf8 __cxxabiv1::__aligned_malloc_with_fallback(unsigned long) + 0
7 libc++abi.dylib 0x000000018261ae4c demangling_unexpected_handler() + 0
8 libobjc.A.dylib 0x00000001825136d8 _objc_terminate() + 160
9 libc++abi.dylib 0x00000001826290e0 std::__terminate(void (*)()) + 20
10 libc++abi.dylib 0x000000018262beb0 __cxa_get_exception_ptr + 0
11 libc++abi.dylib 0x000000018262be5c __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception*) + 0
12 libc++abi.dylib 0x000000010224003c __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception*) + 18446744071557956064
13 lli 0x0000000100c8f680 llvm::MCJIT::runFunction(llvm::Function*, llvm::ArrayRef<llvm::GenericValue>) + 768
14 lli 0x0000000100bd574c llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, char const* const*) + 1464
15 lli 0x000000010070e4d8 main + 8576
16 libdyld.dylib 0x0000000182685f34 start + 4
zsh: abort ./bin/lli throw.ll
It could be that Apple changed something about their exception handling ABI for arm64
because it's not even possible to build my own libunwind
to debug the unwinding as I did for ROOT-10703 - it complains about invalid entries. However clang++
emits the right object files, at least when compiling to an executable directly. So maybe it's at runtime? I'll need to think about this...
even LLVM upstream is unable to handle exceptions during JIT
I am not too surprised. it is the same code (+/- a few things ;)).
So maybe it's at runtime? I
That is what it was last time (in JIT code the instructions space being allocated in an unexpected order).
even LLVM upstream is unable to handle exceptions during JIT
I am not too surprised. it is the same code (+/- a few things ;)).
True, but I had hoped that it was fixed in current trunk and it was just a matter of finding and backporting a change or two to make it work. Anyway my investigations aren't going anywhere right now, so I've filed a bug at https://bugs.llvm.org/show_bug.cgi?id=49692
Bad news for us, from Lang on https://bugs.llvm.org/show_bug.cgi?id=49692 :
This one is on me -- I'll look into supporting compact-unwind, but won't have time to get to it for a couple of weeks.
How urgent is this for you, and are you using ORCv2? Ideally I'll just implement this in JITLink, but that won't help if you're on MCJIT or ORCv1.
I guess disabling exceptions for M1 isn't an option either. But we could prevent the cling throw
from happening on M1, until this is fixed. Would that make sense? Won't help for RDF's exceptions, though...
After LLVM switched from Bugzilla to GitHub issues, here is the link to the migrated issue: https://github.com/llvm/llvm-project/issues/49036
@msneubauer ran some tests on OSX 13 and it seems this issue is fixed. Thanks a lot, Mark! Here is what he ran:
cat test_exceptions.C
void test_exceptions() {
try {
std::cout << "got here\n";
throw 1;
} catch (...) { }
}
wirelessprv-10-193-242-21:tmp msn$ root.exe -l -b -q -e '.x test_exceptions.C'
got here
This is based on a source build of the root_v6.26.06.source.tar.gz tarball.
cc: @hahnjo, @lhames
Hm, this is surprising because fixing libunwind
was only part of the story, I thought there are at least two other missing points as outlined in https://github.com/llvm/llvm-project/issues/49036. Could somebody with a setup of macOS 13 on Apple Silicon test if the catch
block is actually executed, by moving the printout there?
Our best chance is @msneubauer I think.
@hahnjo @vgvassilev
$ cat test_exceptions.C void test_exceptions() { try { std::cout << "got here\n"; throw 1; } catch (...) { std::cout << "got here too\n"; } }
$ root.exe -l -b -q -e '.x test_exceptions.C'
got here got here too
Awesome, @hahnjo can we close this as resolved now?
Possibly for now. Though I wouldn't be too surprised if it breaks again on a future LLVM upgrade (maybe even llvm13) due to the missing things in LLVM and what Lang wrote in the upstream issue...
Before we close this I'd like to see the roottest / gtest tests re-enabled that were disabled because of this issue. I cannot find a registry of the changes we did because of this; do we need to grep for -i arm
since M1 came out to find all occurrences? :-(
This recent roottest failure looks related to this issue https://lcgapp-services.cern.ch/root-jenkins/job/roottest-pullrequests-build/13090/testReport/projectroot.roottest.python/regression/roottest_python_regression_regression/
(it's a test that was never run before, I am resurrecting it, and that's why the failure was triggered only now)
@hahnjo, @vepadulano, what's the status here? This looks like one of these issues that might have been resolved by the recent LLVM upgrade
I just built a fresh version of master
and v6-26-00-patches
on macphsft24
with macOS 14.2; the test posted in https://github.com/root-project/root/issues/7541#issuecomment-1316247162 still fails. It's not clear to me how it could work in one setup in the past, but it's certainly not working out-of-box in the default build configuration.
I confirm it's still broken for llvm16 in root master.
Are you using the unw_add/remove_find_dynamic_unwind_sections
sequence described in https://github.com/llvm/llvm-project/issues/49036#issuecomment-1437318801 ? That's required on macOS 14.0 and later.
Describe the bug
It's not possible to catch exceptions in the interactive
root
prompt on Apple Silicon, nor doesTRint
take care of uncaught exceptions.Expected behavior
The user should be able to catch exceptions, or at least the fallback handler should prevent process termination.
To Reproduce
The most simple examples are
and
(which should be handled in
TRint::HandleTermInput()
).Setup
ROOT 6.25/01 on
macphsft25
Additional context
Plenty:
7100 fixed catching exceptions from constructors on macOS (and has tests, including unwinding through multiple JIT recursions).
Edit:
Same in 6.22/07, tested on
macphsft24