Open hahnjo opened 3 years ago
Hi Lang,
yes, that patch fixes one third of the problem, namely not processing FDEs on AArch64. https://reviews.llvm.org/D103052#2778324 mentions the second problem of forcing emission of .eh_frame sections that I hacked via
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index 1a448f040b3b..60f27aae36c6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -57,9 +57,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) {
MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT,
SectionKind::getReadOnly());
+#if 0
if (T.isOSDarwin() &&
(T.getArch() == Triple::aarch64 || T.getArch() == Triple::aarch64_32))
SupportsCompactUnwindWithoutEHFrame = true;
+#endif
if (T.isWatchABI())
OmitDwarfIfHaveCompactUnwind = true;
However, the real blocker (as far as I can tell) is that libunwind distributed with macOS keeps crashing once you have all that in place. See comments 8 and 9 for more details, and as mentioned in comment 10 I'm still able to reproduce the issue. Unfortunately, as mentioned before, this is not something we can fix since Apple didn't even open-source the modifications to libunwind that provoke the crash...
Jonas
Hi Jonas,
I recently ok'd a patch that seems like it matches your suggested fix: https://reviews.llvm.org/D103052
It hasn't landed yet, but I expect to commit it in the next couple of days.
I'll be focusing more attention on arm64 support over the next few months, so I'm hoping we'll see significant improvements in exception handling on that architecture.
Quick reminder about this issue: I can still reproduce the original problem ("libc++abi: terminating with uncaught exception of type int") with latest main and some hacky changes make "lli --jit-kind=mcjit throw.ll" crash in libunwind as described before. I didn't test my hacky patch to libunwind again, but I would expect it to "work" as before.
diff --git a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp
index 9ca76602ea18..50a3add0c207 100644
--- a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp
+++ b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp
@@ -324,6 +324,7 @@ void RuntimeDyldMachOCRTPBase<Impl>::registerEHFrames() {
continue;
SectionEntry *Text = &Sections[SectionInfo.TextSID];
SectionEntry *EHFrame = &Sections[SectionInfo.EHFrameSID];
+#if 0
SectionEntry *ExceptTab = nullptr;
if (SectionInfo.ExceptTabSID != RTDYLD_INVALID_SECTION_ID)
ExceptTab = &Sections[SectionInfo.ExceptTabSID];
@@ -338,6 +339,7 @@ void RuntimeDyldMachOCRTPBase<Impl>::registerEHFrames() {
while (P != End) {
P = processFDE(P, DeltaForText, DeltaForEH);
}
+#endif
MemMgr.registerEHFrames(EHFrame->getAddress(), EHFrame->getLoadAddress(),
EHFrame->getSize());
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index 1a448f040b3b..60f27aae36c6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -57,9 +57,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) {
MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT,
SectionKind::getReadOnly());
+#if 0
if (T.isOSDarwin() &&
(T.getArch() == Triple::aarch64 || T.getArch() == Triple::aarch64_32))
SupportsCompactUnwindWithoutEHFrame = true;
+#endif
if (T.isWatchABI())
OmitDwarfIfHaveCompactUnwind = true;
I poked some more and I think I see where the crash in the system libunwind comes from: From the accesses I see it appears that unw_set_reg loads a flag or something from the beginning of the MachO image that contains the exception handler. This works fine for compiled executables, but is set to the NULL pointer for dynamically registered FDEs: https://github.com/llvm/llvm-project/blob/32bc9a9bc3147a86ed43827d29d8f7d7cf05eb4a/libunwind/src/UnwindCursor.hpp#L1981 (the last parameter is dso_base and written into the extra field, which is documented as "mach_header of mach-o image containing func").
As this doesn't happen in LLVM's libunwind nor in the version available on opensource.apple.com, I guess there are more modifications Apple did and there's not much we can do as "users" of the system. Lang, could you help us get this message to the right team at Apple so they can fix their library?
Thanks, Jonas
hacky patch for libunwind Some updates on this: I finally succeeded in catching exceptions (of a compiled program) with a custom libunwind. The missing piece was pointer authentication, which is (in my limited understanding) a tag added into the higher bits of a pointer value, for example in the link register and return addresses. For unknown reasons, this is not dealt with in available source code of libunwind (including the one from opensource.apple.com), but fairly easy to hack together for experimentation - I'm attaching a patch for future reference.
With that in place, I was able to run through the unwinding with lldb and found that the program addresses don't match the registered FDE. Disabling the call to RuntimeDyldMachOCRTPBase::processFDE in RuntimeDyldMachOCRTPBase::registerEHFrames fixes that and I'm able to catch exceptions thrown from JITed code - hooray! At least with my self-compiled libunwind, the system-provided library crashes somewhere in _Unwind_SetIP -> unw_set_reg before execution of the landing pad :-(
Ok, thanks for the pointers. As written in comment 1, forcing emission of __eh_frame doesn't help, and I haven't succeeded in catching exceptions using a custom version of libunwind either (even not the one from Apple). So there must have been some changes, and neither JITlink nor RuntimeDyld are currently prepared to handle them. Of course, we need support for compact unwinding in the long run to load arbitrary libraries, but being at least able to catch exceptions of our own libraries (where we can control the compilation options) would be great in the short term. I might have another look into the pages you linked, but I'm not super confident.
Jonas
Hi Jonas,
So darwinarm64 is compact unwinding only? Or would there be a chance to get __eh_frame working, even if not the default for the platform? Do you have technical information with respect to the "new" unwinding ABI?
I haven't had time to dig in to the details yet. My understanding is that arm64 prefers to emit compact unwind records (they're smaller than eh_frame records), but can not do so in all cases and falls back to eh_frame records where necessary. There is probably an option to force the compiler to emit eh_frame records for all functions, but I have not looked for it yet. It's also possible that JITLink's current eh-frame support is not sufficient for arm64 yet -- I'm not sure it uses the same pointer encodings as x86-64. In any case using compact_unwind is the eventual goal, because we would like to be able to load objects and archives that weren't compiled specifically for the JIT.
The best sources for compact_unwind that I'm aware of are the apple compact_unwind_encoding.h header [1], the source for libunwind [2], the source for ld64 [3].
-- Lang.
[1] https://opensource.apple.com/source/libunwind/libunwind-35.1/include/mach-o/compact_unwind_encoding.h.auto.html [2] https://github.com/llvm/llvm-project/tree/main/libunwind [3] https://opensource.apple.com/source/ld64/
Hi Lang,
This one is on me -- I'll look into supporting compact-unwind, but won't have time to get to it for a couple of weeks.
So darwinarm64 is compact unwinding only? Or would there be a chance to get __eh_frame working, even if not the default for the platform? Do you have technical information with respect to the "new" unwinding ABI?
Thanks, Jonas
Hey Lang,
See all of that is already a wealth of information! :-) Tells us that here we don't need to hunt our own bug, for once.
We're currently on llvm9's ORCv1; I hope we will be on ORCv2 by the end of the year.
The current state is pretty painful for M1 users... but it's not mission critical, our particles will continue to collide. And I can totally relate if you do this for ORCv2 only. If all fails on our side we might still be able to learn from what you do for JITLink and try to hack this into llvm9's ORCv1.
Thank you so much for your reply! Looking forward to your fixes in a couple of weeks, even if in JITLink!
Axel.
Hi Axel,
This one is on me -- I'll look into supporting compact-unwind, but won't have time to get to it for a couple of weeks.
How urgent is this for you, and are you using ORCv2? Ideally I'll just implement this in JITLink, but that won't help if you're on MCJIT or ORCv1.
@lhames any recommendation whom to contact here? The fact that providing a custom-built libunwind is a dead end and that lli doesn't get this right either means we really don't know what to do here... Do you know of EH ABI specialists for Apple's AArch64?
Part of the problem may be related to "T.isOSDarwin() && .getArch() == Triple::aarch64" defaulting to emit only compact unwinding info without __eh_frame. I tried forcing LLVM to generate that section with
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index 155104cddda2..6b045bafdf59 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -56,9 +56,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) {
MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT,
SectionKind::getReadOnly());
+#if 0
if (T.isOSDarwin() &&
(T.getArch() == Triple::aarch64 || T.getArch() == Triple::aarch64_32))
SupportsCompactUnwindWithoutEHFrame = true;
+#endif
if (T.isWatchABI())
OmitDwarfIfHaveCompactUnwind = true;
and afterwards I see the section being handed over to RTDyldMemoryManager::registerEHFramesInProcess, but it still fails to catch the exception.
Unfortunately, I haven't succeeded in building libunwind with debug symbols and trace through the unwinding to see what's going wrong: It didn't even work to inject a custom libunwind (neither from LLVM trunk nor the patched version from https://opensource.apple.com/source/libunwind/) into a compiled executable, that results in no exceptions being caught either. It appears Apple did some changes to the exception handling ABI compared to x86_64, we probably need some of their engineers to help us out here...
Still broken in the same way, the hacks mentioned in https://github.com/llvm/llvm-project/issues/49036#issuecomment-981040215 still provoke the crash in libunwind
's unw_set_reg
.
@lhames any updates on this?
Hi @hahnjo,
It seems that Apple has released libunwind source code (https://github.com/apple-oss-distributions/libunwind/). Would this codebase help with this issue?
Thanks!
Hi @inumanag, thanks for sharing. Unfortunately, these are the same versions as already available from https://opensource.apple.com/source/libunwind/, which, as mentioned in a previous comment, does not have all modifications that Apple actually ships. Most importantly, it doesn't have the code in unw_set_reg
that crashes.
Thanks for your work on this @hahnjo -- I was wondering if you ever found a satisfactory solution with a modified libunwind
? I tried combining your patches from https://github.com/llvm/llvm-project/issues/49036#issuecomment-981040217 and https://github.com/llvm/llvm-project/issues/49036#issuecomment-981040215, but to no avail, at least on LLVM 15. Were there any other modifications that had to be made?
Hi @arshajii, never "satisfactory" but I remember that at some point I managed to catch an exception in lli
(https://github.com/llvm/llvm-project/issues/49036#issuecomment-981040217). This requires both the "hacky" patch to libunwind and disabling some parts in LLVM. I haven't tested this recently, so you're saying it doesn't work anymore? If so, I may take a look but probably not this week...
Hi @hahnjo: I managed to get it working with JITLink (whereas before I was testing with RuntimeDyld), although I'm still testing to make sure everything works. I couldn't get it working with RuntimeDyld. The steps for JITLink, at least with LLVM 15, are:
SupportsCompactUnwindWithoutEHFrame
as you mentioned in https://github.com/llvm/llvm-project/issues/49036#issuecomment-981040226@msneubauer ran some tests on OSX 13 and it seems this issue is fixed. Thanks a lot, Mark! Here is what he ran:
cat test_exceptions.C
void test_exceptions() {
try {
std::cout << "got here\n";
throw 1;
} catch (...) { }
}
wirelessprv-10-193-242-21:tmp msn$ root.exe -l -b -q -e '.x test_exceptions.C'
got here
This is based on a source build of the root_v6.26.06.source.tar.gz tarball.
cc: @hahnjo, @lhames
I can confirm on my end too that OSX 13 is working (on LLVM 15 w/ JIT Link). 🥳
Before we break out the party hats I should warn that the current "everything just works" state is temporary: We plan to introduce a new registration function in libunwind that will be required in future updates in order for JIT'd exception handling to work. This registration function will be used by libunwind to determine the MachO subarchitecture of JIT'd frames. I'll aim to provide an update soon on the details.
It's taken a while but I have a review for a proposed long-term solution up at https://reviews.llvm.org/D142176.
If/when that change is accepted I'll update this issue with instructions on how to add registration API calls to your app to ensure that JIT'd exceptions work in future releases.
If/when that change is accepted I'll update this issue with instructions on how to add registration API calls to your app to ensure that JIT'd exceptions work in future releases.
Just to be sure: The callback should be implemented by the JIT infrastructure, which should also call the registration API, right? Why would we do that from the app?
Just to be sure: The callback should be implemented by the JIT infrastructure, which should also call the registration API, right? Why would we do that from the app?
I'll include exact details once the libunwind patch is accepted, but in short: The callback and registration support will be added to the ORC runtime (a prototype is already in the LLVM mainline), but clients will need to manually register a callback until/unless they pick up a new ORC runtime.
Registration will be required in the near future, so the safe path will be to implement the manual callback immediately, and then remove it once you're able to pick up an ORC runtime that includes the fix.
Ok https://reviews.llvm.org/D142176 has landed. If you're using LLVM top-of-tree (LLVM 17) and the ORC runtime then DWARF exceptions should Just Work, and compact-unwind is on the way. In that case you should not use the code below, as it will shadow registration code that's already in the ORC runtime.
If you're using MCJIT, ORC without the ORC runtime, or an earlier LLVM version then you'll want to add the following code to your app/library.
// Define a minimal mach header for JIT'd code.
static MachO::mach_header_64 fake_mach_header = {
.magic = MachO::MH_MAGIC_64,
.cputype = MachO::CPU_TYPE_ARM64,
.cpusubtype = MachO::CPU_SUBTYPE_ARM64_ALL,
.filetype = MachO::MH_DYLIB,
.ncmds = 0,
.sizeofcmds = 0,
.flags = 0,
.reserved = 0
};
// Declare libunwind SPI types and functions.
struct unw_dynamic_unwind_sections {
uintptr_t dso_base;
uintptr_t dwarf_section;
size_t dwarf_section_length;
uintptr_t compact_unwind_section;
size_t compact_unwind_section_length;
};
int find_dynamic_unwind_sections(
uintptr_t addr, unw_dynamic_unwind_sections *info) {
info->dso_base = (uintptr_t)&fake_mach_header;
info->dwarf_section = 0;
info->dwarf_section_length = 0;
info->compact_unwind_section = 0;
info->compact_unwind_section_length = 0;
return 1;
}
// Typedef for callback above.
typedef int (*unw_find_dynamic_unwind_sections)(
uintptr_t addr, struct unw_dynamic_unwind_sections *info);
with those declarations in place you can use the following code to check for and use the new registration functions:
// At program / library load time:
if (auto *unw_add_find_dynamic_unwind_sections =
(int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections))
dlsym(RTLD_DEFAULT, "__unw_add_find_dynamic_unwind_sections"))
unw_add_find_dynamic_unwind_sections(find_dynamic_unwind_sections);
// At program / library unload time:
if (auto *unw_remove_find_dynamic_unwind_sections =
(int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections))
dlsym(RTLD_DEFAULT, "__unw_remove_find_dynamic_unwind_sections"))
unw_remove_find_dynamic_unwind_sections(find_dynamic_unwind_sections);
Libunwind will call the find_dynamic_unwind_sections
callback, which unconditionally returns fake_mach_header
as the header for all JIT'd code. Libunwind will query the cpu subtype of field of the header to determine how pointer-signing should be handled for JIT'd frames.
@williamhCode This bug is about JIT'd exception handling only.
Ok https://reviews.llvm.org/D142176 has landed. If you're using LLVM top-of-tree (LLVM 17) and the ORC runtime then DWARF exceptions should Just Work, and compact-unwind is on the way. In that case you should not use the code below, as it will shadow registration code that's already in the ORC runtime.
If you're using MCJIT, ORC without the ORC runtime, or an earlier LLVM version then you'll want to add the following code to your app/library.
// Define a minimal mach header for JIT'd code. static MachO::mach_header_64 fake_mach_header = { .magic = MachO::MH_MAGIC_64, .cputype = MachO::CPU_TYPE_ARM64, .cpusubtype = MachO::CPU_SUBTYPE_ARM64_ALL, .filetype = MachO::MH_DYLIB, .ncmds = 0, .sizeofcmds = 0, .flags = 0, .reserved = 0 }; // Declare libunwind SPI types and functions. struct unw_dynamic_unwind_sections { uintptr_t dso_base; uintptr_t dwarf_section; size_t dwarf_section_length; uintptr_t compact_unwind_section; size_t compact_unwind_section_length; }; int find_dynamic_unwind_sections( uintptr_t addr, unw_dynamic_unwind_sections *info) { info->dso_base = (uintptr_t)&fake_mach_header; info->dwarf_section = 0; info->dwarf_section_length = 0; info->compact_unwind_section = 0; info->compact_unwind_section_length = 0; return 1; } // Typedef for callback above. typedef int (*unw_find_dynamic_unwind_sections)( uintptr_t addr, struct unw_dynamic_unwind_sections *info);
with those declarations in place you can use the following code to check for and use the new registration functions:
// At program / library load time: if (auto *unw_add_find_dynamic_unwind_sections = (int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections)) dlsym(RTLD_DEFAULT, "__unw_add_find_dynamic_unwind_sections")) unw_add_find_dynamic_unwind_sections(find_dynamic_unwind_sections); // At program / library unload time: if (auto *unw_remove_find_dynamic_unwind_sections = (int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections)) dlsym(RTLD_DEFAULT, "__unw_remove_find_dynamic_unwind_sections")) unw_remove_find_dynamic_unwind_sections(find_dynamic_unwind_sections);
Libunwind will call the
find_dynamic_unwind_sections
callback, which unconditionally returnsfake_mach_header
as the header for all JIT'd code. Libunwind will query the cpu subtype of field of the header to determine how pointer-signing should be handled for JIT'd frames.
@lhames, would this be appropriate to live in clang/lib/Interpreter, for example? Or generally in the OrcJit infrastructure so that we do not need to put the burden on all clients?
@llvm/issue-subscribers-orcjit
Author: Jonas Hahnfeld (hahnjo)
@lhames, would this be appropriate to live in clang/lib/Interpreter, for example? Or generally in the OrcJit infrastructure so that we do not need to put the burden on all clients?
It depends on registration order -- if we embed a copy in clang/lib/Interpreter
we want to make sure that it's visited after any user-installed find_dynamic_unwind_sections
callbacks, otherwise it will shadow them.
Maybe we could have it run on first construction of an Interpreter instance?
Extended Description
On arm64-apple-darwin20.3.0, consider the following code:
When compiled into an executable (by either Apple clang or Clang trunk), the program exits normally:
However, the same code crashes when compiling into LLVM IR and executing via lli:
I've tried all combinations of --jit-kind and --jit-link, none of them worked.