Segmentation fault caused by likely LLVM bug that results in incorrect exception handling on aarch64

SeanTAllen commented 3 years ago

While getting pony working on 64-bit Arm, we found a bug where the test-stdlib-debug tests would segfault. Everything worked fine with release builds, but debug builds would crash.

After investigation, we've found that in codegen_machine in host.cc if the opt_level is get to either CodeGenOpt::Default or CodeGenOpt::Aggressive instead of CodeGenOpt::None the problem goes away.

It seems likely that this is an LLVM bug, but that hasn't been established. We've committed a fix in host.cc that when the target is arm (which is 64-bit arm) that we use CodeGenOpt::Default. This is far from ideal, but gets us working compilations until we can investigate further.

The end result of this should either be a patch that we upstream to LLVM or possibly, the discovery that we are doing something wrong with our Arm code generation that optimization manages to fix.

SeanTAllen commented 3 years ago

The host.cc change makes no difference when we are building using musl rather than glibc. We still get segfaults in the debug build on Arm-64 when using musl.

SeanTAllen commented 2 years ago

The tests that fail seem to be repeatable from run to run. I've been working through the builtin tests commenting out the ones that crash and then another crashes but it is always one that wasn't run previously. I'm building LLVM on a 64 bit raspberry pi 4 to see if it is reproducible there or if we are in for a tricky time getting more info.

The changes are on the branch 3874.

SeanTAllen commented 2 years ago

Ok this is reproducible with the same tests failing on Raspberry PI 4 64 bit.

SeanTAllen commented 2 years ago

Well this is fascinating...

(gdb) bt
#0  0x00000055559e98b0 in pony_test_TestHelper_val__format_loc_oo (this=0x7fe6d016a0, loc=0x0)
    at /home/pi/code/ponylang/ponyc/packages/pony_test/test_helper.pony:362
#1  0x00000055559e4fc4 in pony_test_TestHelper_val_assert_error_ooob (this=0x7fddce1c40, test=0x555640a108, msg=0x5556410380, loc=0x55564077c0)
    at /home/pi/code/ponylang/ponyc/packages/pony_test/test_helper.pony:95
#2  0x0000005555b49434 in builtin_test__TestArrayInsert_ref_apply_oo (this=0x7fe65001e0, h=0x7fddce1c40)
    at /home/pi/code/ponylang/ponyc/packages/builtin_test/_test.pony:1512
#3  0x00000055557bef9c in pony_test__TestRunner_tag_run_o (this=0x7fe649dc00)
    at /home/pi/code/ponylang/ponyc/packages/pony_test/_test_runner.pony:63
#4  0x0000005555764728 in pony_test__TestRunner_Dispatch ()
#5  0x0000005555c6993c in ponyint_actor_run ()
#6  0x0000005555c73d28 in run_thread ()
#7  0x0000007ff7fa67e4 in start_thread (arg=0x7ffffff16f) at pthread_create.c:486
#8  0x0000007ff7dd6adc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

SeanTAllen commented 2 years ago

@jemc this is one for you and i to talk about.

ergl commented 2 years ago

The backtrace looks identical to what we were getting on the M1. I can't remember the root cause, but maybe @jemc can remember.

Edit: some of these threads might help.

Although on second thought, it seems like the problem with the backtrace above is that the location object is null. I'd look at the IR to check if we're getting any undef references for that.

SeanTAllen commented 2 years ago

So it looks like:

#0  0x00000055559e98b0 in pony_test_TestHelper_val__format_loc_oo (this=0x7fe6d016a0, loc=0x0)
    at /home/pi/code/ponylang/ponyc/packages/pony_test/test_helper.pony:362
#1  0x00000055559e4fc4 in pony_test_TestHelper_val_assert_error_ooob (this=0x7fddce1c40, test=0x555640a108, msg=0x5556410380, loc=0x55564077c0)
    at /home/pi/code/ponylang/ponyc/packages/pony_test/test_helper.pony:95

We have the location in frame 1. But its null in frame 2.

  fun assert_error(test: ITest box, msg: String = "", loc: SourceLoc = __loc)
    : Bool
  =>
    """
    Assert that the given test function throws an error when run.
    """
    try
      test()?
      fail(_format_loc(loc) + "Assert error failed. " + msg)
      false
    else
      log(_format_loc(loc) + "Assert error passed. " + msg, true)
      true
    end

Its a straight passthrough. I suspect that the reason that having some optimization causes the crash to go away is that we inline the very small _format_loc method and so whatever our weirdness is for loc being null goes away. But I have no actual evidence of that yet, its purely a guess. Really thought the important question is "how the hell does loc end up null".

Here's _format_loc for edification:

  fun _format_loc(loc: SourceLoc): String =>
    loc.file() + ":" + loc.line().string() + ": "

SeanTAllen commented 2 years ago

I have determined that the failure happens with a more minimal example as well:

use "pony_test"
use "collections"

actor \nodoc\ Main is TestList
  new create(env: Env) => PonyTest(env, this)
  new make() => None

  fun tag tests(test: PonyTest) =>
    test(_TestArrayFind)

class \nodoc\ iso _TestArrayFind is UnitTest
  fun name(): String => "builtin/Array.find"

  fun apply(h: TestHelper) =>
    let a = [as ISize: 0; 1; 2; 3; 4; 1]
    h.assert_error({() ? => a.find(6)? })

And changing so that a.find won't error doesn't result in a crash:

use "pony_test"
use "collections"

actor \nodoc\ Main is TestList
  new create(env: Env) => PonyTest(env, this)
  new make() => None

  fun tag tests(test: PonyTest) =>
    test(_TestArrayFind)

class \nodoc\ iso _TestArrayFind is UnitTest
  fun name(): String => "builtin/Array.find"

  fun apply(h: TestHelper) =>
    let a = [as ISize: 0; 1; 2; 3; 4; 1]
    h.assert_error({() ? => a.find(1)? })

SeanTAllen commented 2 years ago

Linting the llvm with --lint-llvm results in:

Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8
Pessimization: Static alloca outside of entry block
  %22 = alloca i8*, align 8

but this happens on x86 where there is no issue as well.

--verify reports no issues.

SeanTAllen commented 2 years ago

Here's a more minimal example:

actor Main
  new create(env: Env) =>
    foo("hello")

  fun foo(s: String) =>
    try
      one(s)
      error
    else
      one(s)
    end

  fun one(s: String) =>
    s.clone()

SeanTAllen commented 2 years ago

@ergl could you see if this fails on an M1? The key seems to be to use a release runtime but do -d when compiling the program.

ergl commented 2 years ago

@SeanTAllen No errors for me. No combination of release/debug runtime and setting -d or leaving it off had the issue for the two examples you posted.

SeanTAllen commented 2 years ago

Interesting so this might be aarch64 + linux issue.

jemc commented 2 years ago

Note that if it's like the M1 issue, then the problem goes away if you link the program against the runtime bitcode (which I still think should be the default build flow where available).

SeanTAllen commented 2 years ago

@ergl can you try on m1 with the host.cc change removed? so that debug is doing no optimization?

jemc commented 2 years ago

@SeanTAllen and I think this is related to us needing to write some ARM64-specific code in posix_except.c. We currently have ARM32-specific code, and then catch-all code for everything else.

Here's a PDF that includes some info on the ARM64 Exception Handling ABI: https://documentation-service.arm.com/static/5fa190a7b1a7c5445f2904d6?token=

jemc commented 2 years ago

That PDF links to this doc, which seems to have the more relevant information: https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html

SeanTAllen commented 2 years ago

@sylvanc says he is believes that yes, the current usage of catch all code is probably wrong for arm exceptions.

but when i told him that aarch64 docs say that it is the same as itanium. he said itanium is the same as x86 except perhaps some low level register setting.

jemc commented 2 years ago

@SeanTAllen - try this diff:

diff --git a/src/libponyc/codegen/codegen.c b/src/libponyc/codegen/codegen.c
index 6a972f88..147435b5 100644
--- a/src/libponyc/codegen/codegen.c
+++ b/src/libponyc/codegen/codegen.c
@@ -265,6 +265,7 @@ static void init_runtime(compile_t* c)

   unsigned int align_value = target_is_ilp32(c->opt->triple) ? 4 : 8;

+  LLVM_DECLARE_ATTRIBUTEREF(uwtable_attr, uwtable, 0);
   LLVM_DECLARE_ATTRIBUTEREF(nounwind_attr, nounwind, 0);
   LLVM_DECLARE_ATTRIBUTEREF(readnone_attr, readnone, 0);
   LLVM_DECLARE_ATTRIBUTEREF(readonly_attr, readonly, 0);
@@ -606,7 +607,9 @@ static void init_runtime(compile_t* c)

   // i32 ponyint_personality_v0(...)
   type = LLVMFunctionType(c->i32, NULL, 0, true);
-  c->personality = LLVMAddFunction(c->module, "ponyint_personality_v0", type);
+  value = LLVMAddFunction(c->module, "ponyint_personality_v0", type);
+  c->personality = value;
+  LLVMAddAttributeAtIndex(value, LLVMAttributeFunctionIndex, uwtable_attr);

   // i32 memcmp(i8*, i8*, intptr)
   params[0] = c->void_ptr;

SeanTAllen commented 2 years ago

@jemc same boom

SeanTAllen commented 2 years ago

Interestingly, this also segfaults because of the message send that originates in the else of the try:

actor Main
  let _out: OutStream

  new create(env: Env) =>
    _out = env.out
    foo("hello")

  fun foo(s: String) =>
    try
      one("h")
      error
    else
      one("e")
    end

  fun one(s: String) =>
    _out.print(s)

but this does not:

actor Main
  let _out: OutStream

  new create(env: Env) =>
    _out = env.out
    foo("hello")

  fun foo(s: String) =>
    try
      one("h")
      error
    else
      one("e")
    end

  fun one(s: String) =>
    s.clone()

SeanTAllen commented 2 years ago

I discussed this with @sylvanc. He took a look at the pony_try llvm ir that we use and said it looked correct and the personality function looked correct as well. He suggested looking at a C++ aarch64 personality function to see if it was different in a way we weren't aware of.

I'll be taking a look at the one from the LLVM project:

https://github.com/llvm/llvm-project/blob/main/libcxxabi/src/cxa_personality.cpp

SeanTAllen commented 2 years ago

I've found a difference in our personality function from the c++ abi.

When we set registers, we do not do what the C++ does:

  _Unwind_SetGR(context, __builtin_eh_return_data_regno(1),
                static_cast<uintptr_t>(results.ttypeIndex));

It is setting context to the type index found from the lsda scan.

Whereas we do:

  _Unwind_SetGR(context, __builtin_eh_return_data_regno(1), 0);

Additionally there are differences in ordering in the personality functions which might be because we need the scan results for use elsewhere or simply different way of writing the code. The pony code is clearer, but maybe not correct in addition to the register issue.

SeanTAllen commented 2 years ago

So we get the ttypeIndex from:

/// Read a sleb128 encoded value and advance pointer
/// See Variable Length Data Appendix C in:
/// @link http://dwarfstd.org/Dwarf4.pdf @unlink
/// @param data reference variable holding memory pointer to decode from
/// @returns decoded value
static
intptr_t
readSLEB128(const uint8_t** data)

plus there's additional logic for deciding it to actually set that in results that I haven't fully worked through.

SeanTAllen commented 2 years ago

I changed this to "triggers release" as this is actually incorrect exception handling on aarch64.

jemc commented 2 years ago

In case anything here is helpful: https://llvm.org/docs/ExceptionHandling.html#exception-handling-support-on-the-target

SeanTAllen commented 2 years ago

I talked through this with Sylvan and we came to the conclusion that this is a probably a bug in LLVM like I initially thought. In particular a bug in the aarch64 code that happens to get fixed by some optimization. Because the LLVM ir is the same for x86 debug and aarch64 debug besides the platform things you'd expect to be different and we aren't doing anything special for aarch64 vs x86 so it isn't a bug in what we are doing AND as joe has seen, if you create bitcode and link with the llvm tools, the bug also goes away so... bug somewhere in llvm for aarch64.

SeanTAllen commented 1 year ago

I've been thinking about this issue and something new occurs to me.

Why does this only happen with a release/optimized version of the runtime? Why is it fine with a debug runtime + debug pony binary? That seems very odd. I have a feeling that perhaps there's something there that will help us find the issue.

SeanTAllen commented 1 year ago

I retested this and it looks like it effects both debug and release versions of the runtime, which was either overlooked when i first opened this issue or is new. Either way, it makes me feel a little better that it isn't only happening with release versions of the runtime.

ponylang / ponyc

Segmentation fault caused by likely LLVM bug that results in incorrect exception handling on aarch64 #3874