llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.82k stars 10.99k forks source link

Some functions with outlined prologs fail to catch exceptions #96950

Open ellishg opened 1 week ago

ellishg commented 1 week ago

We have been investigating a bug in an iOS app where a function is failing to catch an exception, leading to a crash. The related functions had their prologs outlined using -homogeneous-prolog-epilog, and disabling that optimization fixed the crash. We also found that reverting https://github.com/llvm/llvm-project/commit/6e54fccede402c9ed0e8038aa258a99c5a2773e5 (https://reviews.llvm.org/D153098) also fixed crash when the optimization is enabled.

Since I don't know a lot about CFI instructions, I don't have a small reproducer, but I am happy to provide IR for a few functions if that helps.

@igorkudrin @MaskRay @smeenai Do any of you have some idea of what's going on? Can this diff be safely reverted? Or can we fix somehow?

efriedma-quic commented 1 week ago

If you can provide the asm for a function with/without homogeneous-prolog-epilog that's causing an issue, it might be something obvious, like we're accidentally dropping a CFI directive somewhere.

ellishg commented 1 week ago

I'll try to bisect the functions to identify which one is broken. If I find it, I'll share the assembly.

ellishg commented 1 week ago

We've isolated the problematic function (which was in the middle of the call stack between the function that threw and the function that should have caught the exception) and dumped the MIR before and after the revert. The only difference is the location of these CFI instructions.

--- a/before-revert.txt
+++ b/after-revert.txt
bb.0 (%ir-block.2):
   successors: %bb.2(0x30000000), %bb.1(0x50000000); %bb.2(37.50%), %bb.1(62.50%)
   liveins: $x0, $x8, $lr, $x19, $x20
   frame-setup HOM_Prolog $lr, $fp, $x19, $x20, 16
-  $sp = frame-setup SUBXri $sp, 16, 0
   frame-setup CFI_INSTRUCTION def_cfa $w29, 16
   frame-setup CFI_INSTRUCTION offset $w30, -8
   frame-setup CFI_INSTRUCTION offset $w29, -16
   frame-setup CFI_INSTRUCTION offset $w19, -24
   frame-setup CFI_INSTRUCTION offset $w20, -32
+  $sp = frame-setup SUBXri $sp, 16, 0
   renamable $x19 = COPY $x8
   DBG_VALUE 0, $noreg, !"manifest_block", !DIExpression()
   DBG_VALUE $x19, $noreg, !"manifest_store", !DIExpression(DW_OP_deref)
ellishg commented 6 days ago

I was able to find a reduced test that shows that .cfi_def_cfa_offset is missing.

; RUN: llc -mtriple=aarch64-linux-gnu < %s | FileCheck %s

target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios13.4.0"

%"class.std::__1::vector" = type { ptr, ptr, %"class.std::__1::__compressed_pair.1" }
%"class.std::__1::__compressed_pair.1" = type { %"struct.std::__1::__compressed_pair_elem.2" }
%"struct.std::__1::__compressed_pair_elem.2" = type { ptr }

define void @foo(ptr sret(%"class.std::__1::vector") %0) #0 {
; CHECK-LABEL: foo:
; CHECK:       // %bb.0:
; CHECK-NEXT:     sub   sp, sp, #48
; The following CFI instruction is missing after 6e54fcc
; CHECK-NEXT:     .cfi_def_cfa_offset 48
; CHECK-NEXT:     stp   x29, x30, [sp, #16]             // 16-byte Folded Spill
; CHECK-NEXT:     str   x19, [sp, #32]                  // 8-byte Folded Spill
; CHECK-NEXT:     add   x29, sp, #16
; CHECK-NEXT:     .cfi_def_cfa w29, 32
; CHECK-NEXT:     .cfi_offset w19, -16
; CHECK-NEXT:     .cfi_offset w30, -24
; CHECK-NEXT:     .cfi_offset w29, -32
; CHECK-NEXT:     mov   x19, x8

  %2 = alloca ptr, i32 0, align 8
  %3 = alloca i64, i32 0, align 8
  call void @bar()
  call void @llvm.memset.p0.i64(ptr %0, i8 0, i64 1, i1 false)
  ret void
}

declare void @bar()

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

attributes #0 = { "frame-pointer"="non-leaf" }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }

I found these comments to be suspicious: https://reviews.llvm.org/D153098#4462424

In D153098#4462255, @MaskRay wrote: Have you checked all the lost .cfi_def_cfa_offset and verified that they are all non-important?

The patch preserves the last .cfi_def_cfa or .cfi_def_cfa_offset instruction in the prolog, which is enough for synchronous unwind tables.

But for this test, there is no .cfi_def_cfa_offset instruction in the prolog at all. You can also see in the diff that these instructions are completely removed from foo in llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll.

https://github.com/llvm/llvm-project/blob/7840c0066837797cdeb62aab63044b964aa7f372/llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll#L9-L20

efriedma-quic commented 6 days ago

The "CFA" is a fixed position on the stack, the value of the stack pointer on entry to the function. cfi_def_cfa_offset/cfi_def_cfa provide different ways to define it: relative to the current sp, or relative to some other register. So the directive is redundant, and your "reduced" testcase is doesn't show a bug, I think.

Can you show llvm-dwarfdump output for the function that's causing the issue before/after the patch?

ellishg commented 6 days ago

Here are the dwarfdump outputs: before-revert.txt after-revert.txt

efriedma-quic commented 6 days ago

That doesn't have what I'm looking for... I guess llvm-dwarfdump can't see the unwind. Maybe try llvm-objdump --unwind-info

ellishg commented 6 days ago

I'm not seeing any unwind flag in llvm-dwarfdump :(

efriedma-quic commented 6 days ago

I said llvm-objdump?

ellishg commented 6 days ago

Oh sorry, here is the output of llvm-objdump -u <object>: after-revert.txt before-revert.txt

Add here are the addresses of the problematic function:

$ nm after-revert | grep _foo
000000000015b4e4 t __foo
$ nm before-revert | grep _foo
00000000001590cc t __foo
efriedma-quic commented 6 days ago

So before-revert, the compact unwind encoding is 0x0000000. That seems suspicious. Compact-unwind is MachO specific, though; I'm not really familiar with how it's supposed to work.

ellishg commented 6 days ago

I did notice that, but I'm not familiar with the compact unwind format either. @thevinster @int3 do you have an idea of what's going on?

drodriguez commented 6 days ago

The original change https://reviews.llvm.org/D153098 recreates a set of variables in emitDefineCFAWithFP:

https://github.com/llvm/llvm-project/blob/43b9888214234363e3468ffda5bcd599e9608938/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp#L1650-L1653

However, the variable AFI in emitPrologue was further customized with calls setRedZone in https://github.com/llvm/llvm-project/blob/43b9888214234363e3468ffda5bcd599e9608938/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp#L1744 , setTaggedPointerBase in https://github.com/llvm/llvm-project/blob/43b9888214234363e3468ffda5bcd599e9608938/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp#L1825-L1827 , setLocalStackSize in https://github.com/llvm/llvm-project/blob/43b9888214234363e3468ffda5bcd599e9608938/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp#L1844 , and maybe others.

I don't know if any of those calls are important for the code of emitDefineCFAWithFP, but if they are, and since both calls of emitDefineCFAWithFP are part of emitPrologue, why those variables were not passed as arguments, so their setup is exactly the same as it was in emitPrologue? If one of the calls (the original change seems to add the second one) needs a cleaner AFI, it can still generate one and pass it to the function.

llvmbot commented 6 days ago

@llvm/issue-subscribers-backend-aarch64

Author: Ellis Hoag (ellishg)

We have been investigating a bug in an iOS app where a function is failing to catch an exception, leading to a crash. The related functions had their prologs outlined using `-homogeneous-prolog-epilog`, and disabling that optimization fixed the crash. We also found that reverting https://github.com/llvm/llvm-project/commit/6e54fccede402c9ed0e8038aa258a99c5a2773e5 (https://reviews.llvm.org/D153098) also fixed crash when the optimization is enabled. Since I don't know a lot about CFI instructions, I don't have a small reproducer, but I am happy to provide IR for a few functions if that helps. @igorkudrin @MaskRay @smeenai Do any of you have some idea of what's going on? Can this diff be safely reverted? Or can we fix somehow?
efriedma-quic commented 6 days ago

MF.getInfo<AArch64FunctionInfo>() is a pointer to data associated with the function; it doesn't get recreated each time you call it.

igorkudrin commented 5 days ago

The original change https://reviews.llvm.org/D153098 recreates a set of variables in emitDefineCFAWithFP: ... However, the variable AFI in emitPrologue was further customized with calls setRedZone in

emitDefineCFAWithFP() reads only CalleeSaveBaseToFrameRecordOffset and CalleeSavedStackSize which are not changed in emitPrologue().

igorkudrin commented 5 days ago

The patch D153098 rearranges the instructions, as seen in https://github.com/llvm/llvm-project/issues/96950#issuecomment-2200689758, but I believe they are functionally equivalent. My guess for now is that the unwind information generator for MachO may not recognize the new pattern correctly. However, I can't find an example that generates an incorrect binary file for the patched version.

> cat test.ll
define i32 @foo(i32 %b, i32 %a) uwtable minsize {
  %s = alloca i32
  %1 = tail call i32 @bar(i32 %b)
  %2 = tail call i32 @bar(i32 %a)
  %3 = add i32 %a, %b
  %4 = add i32 %3, %1
  %5 = add i32 %4, %2
  %6 = tail call i32 @bar(i32 %5)
  ret i32 %6
}

declare i32 @bar(i32);
> llc-15ef9b26 test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -stop-after=prologepilog -o out-15ef9b26.mir
> llc-6e54fcce test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -stop-after=prologepilog -o out-6e54fcce.mir
> diff -u out-15ef9b26.mir out-6e54fcce.mir
--- out-15ef9b26.mir    2024-07-02 23:21:17.490322000 -0700
+++ out-6e54fcce.mir    2024-07-02 23:21:17.508311700 -0700
@@ -100,14 +100,14 @@
     liveins: $w0, $w1, $lr, $x19, $x20, $x21, $x22

     frame-setup HOM_Prolog $lr, $fp, $x19, $x20, $x21, $x22
+    $sp = frame-setup SUBXri $sp, 16, 0
+    frame-setup CFI_INSTRUCTION def_cfa_offset 64
     frame-setup CFI_INSTRUCTION offset $w30, -8
     frame-setup CFI_INSTRUCTION offset $w29, -16
     frame-setup CFI_INSTRUCTION offset $w19, -24
     frame-setup CFI_INSTRUCTION offset $w20, -32
     frame-setup CFI_INSTRUCTION offset $w21, -40
     frame-setup CFI_INSTRUCTION offset $w22, -48
-    $sp = frame-setup SUBXri $sp, 16, 0
-    frame-setup CFI_INSTRUCTION def_cfa_offset 64
     renamable $w19 = COPY $w1
     renamable $w20 = COPY $w0
     BL @bar, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
> llc-15ef9b26 test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -filetype=obj -o out-15ef9b26.o
> llc-6e54fcce test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -filetype=obj -o out-6e54fcce.o
> llvm-objdump -u out-15ef9b26.o > out-15ef9b26.unwind.dump
> llvm-objdump -u out-6e54fcce.o > out-6e54fcce.unwind.dump
> diff -u out-15ef9b26.unwind.dump out-6e54fcce.unwind.dump
--- out-15ef9b26.unwind.dump    2024-07-02 23:30:18.368070400 -0700
+++ out-6e54fcce.unwind.dump    2024-07-02 23:30:18.376070600 -0700
@@ -1,5 +1,5 @@

-out-15ef9b26.o:    file format mach-o arm64
+out-6e54fcce.o:    file format mach-o arm64
 Unwind info:

 Contents of __compact_unwind section:
igorkudrin commented 2 days ago

@ellishg, can you check that for the same input, revision 15ef9b26, which is immediately before the patch, generates a non-zero encoding for the suspected function, while 6e54fcce emits zero? The difference should be visible after the compilation, you don't need to link the whole application. After that, maybe you will be able to reduce the sample code to keep this difference. The goal is to have some source code you can share, which results in different values in the __compact_unwind section.