Cannot catch exceptions on Apple M1

hahnjo commented 3 years ago

Describe the bug

It's not possible to catch exceptions in the interactive root prompt on Apple Silicon, nor does TRint take care of uncaught exceptions.

Expected behavior

The user should be able to catch exceptions, or at least the fallback handler should prevent process termination.

To Reproduce

The most simple examples are

root [0] try { throw 1; } catch (...) { }
libc++abi.dylib: terminating with uncaught exception of type int

and

root [0] throw 1;
libc++abi.dylib: terminating with uncaught exception of type int

(which should be handled in TRint::HandleTermInput()).

Setup

ROOT 6.25/01 on macphsft25

Additional context

Plenty:

7100 fixed catching exceptions from constructors on macOS (and has tests, including unwinding through multiple JIT recursions).
ROOT-10703 and ROOT-10962 for the original reports.

Edit:

Same in 6.22/07, tested on macphsft24

pcanal commented 3 years ago

This reminds me of https://sft.its.cern.ch/jira/browse/ROOT-8544 and https://sft.its.cern.ch/jira/browse/ROOT-8523 which is the end where fixed by a7b0b3e647409c7510b38198b08ff94fd079f857.

Most likely the way the interpreter sets up the stack frames does not match the expectation of the exception handler (usually implemented in (g)libc).

It is very plausible to be a problem similar to the one leading to the "can not reallocate code" errors. So I see two plausible path forward: (a) fix the reallocate code error and hopefully it also fix this (b) installed a debug version of (g)libc and trace/debug the exception handlers handling of this case ...

hahnjo commented 3 years ago

Ugh, this one might become tricky, even LLVM upstream is unable to handle exceptions during JIT: I tested the most basic

int main() {
  try {
    throw 1;
  } catch (...) { }
}

compiled with ./bin/clang++ -S -emit-llvm throw.cc and interpreted using ./bin/lli throw.ll, resulting in

libc++abi.dylib: terminating with uncaught exception of type int
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./bin/lli throw.ll
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  lli                      0x000000010104824c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 56
1  lli                      0x00000001010471b4 llvm::sys::RunSignalHandlers() + 128
2  lli                      0x00000001010488c4 SignalHandler(int) + 304
3  libsystem_platform.dylib 0x00000001826b1c44 _sigtramp + 56
4  libsystem_pthread.dylib  0x0000000182669c24 pthread_kill + 292
5  libsystem_c.dylib        0x00000001825b1864 abort + 104
6  libc++abi.dylib          0x0000000182629cf8 __cxxabiv1::__aligned_malloc_with_fallback(unsigned long) + 0
7  libc++abi.dylib          0x000000018261ae4c demangling_unexpected_handler() + 0
8  libobjc.A.dylib          0x00000001825136d8 _objc_terminate() + 160
9  libc++abi.dylib          0x00000001826290e0 std::__terminate(void (*)()) + 20
10 libc++abi.dylib          0x000000018262beb0 __cxa_get_exception_ptr + 0
11 libc++abi.dylib          0x000000018262be5c __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception*) + 0
12 libc++abi.dylib          0x000000010224003c __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception*) + 18446744071557956064
13 lli                      0x0000000100c8f680 llvm::MCJIT::runFunction(llvm::Function*, llvm::ArrayRef<llvm::GenericValue>) + 768
14 lli                      0x0000000100bd574c llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, char const* const*) + 1464
15 lli                      0x000000010070e4d8 main + 8576
16 libdyld.dylib            0x0000000182685f34 start + 4
zsh: abort      ./bin/lli throw.ll

It could be that Apple changed something about their exception handling ABI for arm64 because it's not even possible to build my own libunwind to debug the unwinding as I did for ROOT-10703 - it complains about invalid entries. However clang++ emits the right object files, at least when compiling to an executable directly. So maybe it's at runtime? I'll need to think about this...

pcanal commented 3 years ago

even LLVM upstream is unable to handle exceptions during JIT

I am not too surprised. it is the same code (+/- a few things ;)).

So maybe it's at runtime? I

That is what it was last time (in JIT code the instructions space being allocated in an unexpected order).

hahnjo commented 3 years ago

even LLVM upstream is unable to handle exceptions during JIT

I am not too surprised. it is the same code (+/- a few things ;)).

True, but I had hoped that it was fixed in current trunk and it was just a matter of finding and backporting a change or two to make it work. Anyway my investigations aren't going anywhere right now, so I've filed a bug at https://bugs.llvm.org/show_bug.cgi?id=49692

Axel-Naumann commented 3 years ago

Bad news for us, from Lang on https://bugs.llvm.org/show_bug.cgi?id=49692 :

This one is on me -- I'll look into supporting compact-unwind, but won't have time to get to it for a couple of weeks.

How urgent is this for you, and are you using ORCv2? Ideally I'll just implement this in JITLink, but that won't help if you're on MCJIT or ORCv1.

I guess disabling exceptions for M1 isn't an option either. But we could prevent the cling throw from happening on M1, until this is fixed. Would that make sense? Won't help for RDF's exceptions, though...

hahnjo commented 2 years ago

After LLVM switched from Bugzilla to GitHub issues, here is the link to the migrated issue: https://github.com/llvm/llvm-project/issues/49036

vgvassilev commented 1 year ago

@msneubauer ran some tests on OSX 13 and it seems this issue is fixed. Thanks a lot, Mark! Here is what he ran:

cat test_exceptions.C
void test_exceptions() {
  try {
    std::cout << "got here\n";
    throw 1;
  } catch (...) { }
}
wirelessprv-10-193-242-21:tmp msn$ root.exe -l -b -q -e '.x test_exceptions.C'

got here

This is based on a source build of the root_v6.26.06.source.tar.gz tarball.

cc: @hahnjo, @lhames

hahnjo commented 1 year ago

Hm, this is surprising because fixing libunwind was only part of the story, I thought there are at least two other missing points as outlined in https://github.com/llvm/llvm-project/issues/49036. Could somebody with a setup of macOS 13 on Apple Silicon test if the catch block is actually executed, by moving the printout there?

vgvassilev commented 1 year ago

Our best chance is @msneubauer I think.

msneubauer commented 1 year ago

@hahnjo @vgvassilev

$ cat test_exceptions.C void test_exceptions() { try { std::cout << "got here\n"; throw 1; } catch (...) { std::cout << "got here too\n"; } }

$ root.exe -l -b -q -e '.x test_exceptions.C'

got here got here too

vgvassilev commented 1 year ago

Awesome, @hahnjo can we close this as resolved now?

hahnjo commented 1 year ago

Possibly for now. Though I wouldn't be too surprised if it breaks again on a future LLVM upgrade (maybe even llvm13) due to the missing things in LLVM and what Lang wrote in the upstream issue...

Axel-Naumann commented 1 year ago

Before we close this I'd like to see the roottest / gtest tests re-enabled that were disabled because of this issue. I cannot find a registry of the changes we did because of this; do we need to grep for -i arm since M1 came out to find all occurrences? :-(

vepadulano commented 1 year ago

This recent roottest failure looks related to this issue https://lcgapp-services.cern.ch/root-jenkins/job/roottest-pullrequests-build/13090/testReport/projectroot.roottest.python/regression/roottest_python_regression_regression/

(it's a test that was never run before, I am resurrecting it, and that's why the failure was triggered only now)

guitargeek commented 9 months ago

@hahnjo, @vepadulano, what's the status here? This looks like one of these issues that might have been resolved by the recent LLVM upgrade

hahnjo commented 9 months ago

I just built a fresh version of master and v6-26-00-patches on macphsft24 with macOS 14.2; the test posted in https://github.com/root-project/root/issues/7541#issuecomment-1316247162 still fails. It's not clear to me how it could work in one setup in the past, but it's certainly not working out-of-box in the default build configuration.

dpiparo commented 5 months ago

I confirm it's still broken for llvm16 in root master.

lhames commented 5 months ago

Are you using the unw_add/remove_find_dynamic_unwind_sections sequence described in https://github.com/llvm/llvm-project/issues/49036#issuecomment-1437318801 ? That's required on macOS 14.0 and later.

root-project / root