Open ellishg opened 1 week ago
If you can provide the asm for a function with/without homogeneous-prolog-epilog that's causing an issue, it might be something obvious, like we're accidentally dropping a CFI directive somewhere.
I'll try to bisect the functions to identify which one is broken. If I find it, I'll share the assembly.
We've isolated the problematic function (which was in the middle of the call stack between the function that threw and the function that should have caught the exception) and dumped the MIR before and after the revert. The only difference is the location of these CFI instructions.
--- a/before-revert.txt
+++ b/after-revert.txt
bb.0 (%ir-block.2):
successors: %bb.2(0x30000000), %bb.1(0x50000000); %bb.2(37.50%), %bb.1(62.50%)
liveins: $x0, $x8, $lr, $x19, $x20
frame-setup HOM_Prolog $lr, $fp, $x19, $x20, 16
- $sp = frame-setup SUBXri $sp, 16, 0
frame-setup CFI_INSTRUCTION def_cfa $w29, 16
frame-setup CFI_INSTRUCTION offset $w30, -8
frame-setup CFI_INSTRUCTION offset $w29, -16
frame-setup CFI_INSTRUCTION offset $w19, -24
frame-setup CFI_INSTRUCTION offset $w20, -32
+ $sp = frame-setup SUBXri $sp, 16, 0
renamable $x19 = COPY $x8
DBG_VALUE 0, $noreg, !"manifest_block", !DIExpression()
DBG_VALUE $x19, $noreg, !"manifest_store", !DIExpression(DW_OP_deref)
I was able to find a reduced test that shows that .cfi_def_cfa_offset
is missing.
; RUN: llc -mtriple=aarch64-linux-gnu < %s | FileCheck %s
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios13.4.0"
%"class.std::__1::vector" = type { ptr, ptr, %"class.std::__1::__compressed_pair.1" }
%"class.std::__1::__compressed_pair.1" = type { %"struct.std::__1::__compressed_pair_elem.2" }
%"struct.std::__1::__compressed_pair_elem.2" = type { ptr }
define void @foo(ptr sret(%"class.std::__1::vector") %0) #0 {
; CHECK-LABEL: foo:
; CHECK: // %bb.0:
; CHECK-NEXT: sub sp, sp, #48
; The following CFI instruction is missing after 6e54fcc
; CHECK-NEXT: .cfi_def_cfa_offset 48
; CHECK-NEXT: stp x29, x30, [sp, #16] // 16-byte Folded Spill
; CHECK-NEXT: str x19, [sp, #32] // 8-byte Folded Spill
; CHECK-NEXT: add x29, sp, #16
; CHECK-NEXT: .cfi_def_cfa w29, 32
; CHECK-NEXT: .cfi_offset w19, -16
; CHECK-NEXT: .cfi_offset w30, -24
; CHECK-NEXT: .cfi_offset w29, -32
; CHECK-NEXT: mov x19, x8
%2 = alloca ptr, i32 0, align 8
%3 = alloca i64, i32 0, align 8
call void @bar()
call void @llvm.memset.p0.i64(ptr %0, i8 0, i64 1, i1 false)
ret void
}
declare void @bar()
; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1
attributes #0 = { "frame-pointer"="non-leaf" }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }
I found these comments to be suspicious: https://reviews.llvm.org/D153098#4462424
In D153098#4462255, @MaskRay wrote: Have you checked all the lost .cfi_def_cfa_offset and verified that they are all non-important?
The patch preserves the last .cfi_def_cfa or .cfi_def_cfa_offset instruction in the prolog, which is enough for synchronous unwind tables.
But for this test, there is no .cfi_def_cfa_offset
instruction in the prolog at all. You can also see in the diff that these instructions are completely removed from foo
in llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll
.
The "CFA" is a fixed position on the stack, the value of the stack pointer on entry to the function. cfi_def_cfa_offset/cfi_def_cfa provide different ways to define it: relative to the current sp, or relative to some other register. So the directive is redundant, and your "reduced" testcase is doesn't show a bug, I think.
Can you show llvm-dwarfdump output for the function that's causing the issue before/after the patch?
Here are the dwarfdump
outputs:
before-revert.txt
after-revert.txt
That doesn't have what I'm looking for... I guess llvm-dwarfdump can't see the unwind. Maybe try llvm-objdump --unwind-info
I'm not seeing any unwind flag in llvm-dwarfdump
:(
I said llvm-objdump
?
Oh sorry, here is the output of llvm-objdump -u <object>
:
after-revert.txt
before-revert.txt
Add here are the addresses of the problematic function:
$ nm after-revert | grep _foo
000000000015b4e4 t __foo
$ nm before-revert | grep _foo
00000000001590cc t __foo
So before-revert, the compact unwind encoding is 0x0000000. That seems suspicious. Compact-unwind is MachO specific, though; I'm not really familiar with how it's supposed to work.
I did notice that, but I'm not familiar with the compact unwind format either. @thevinster @int3 do you have an idea of what's going on?
The original change https://reviews.llvm.org/D153098 recreates a set of variables in emitDefineCFAWithFP
:
However, the variable AFI
in emitPrologue
was further customized with calls setRedZone
in https://github.com/llvm/llvm-project/blob/43b9888214234363e3468ffda5bcd599e9608938/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp#L1744 , setTaggedPointerBase
in https://github.com/llvm/llvm-project/blob/43b9888214234363e3468ffda5bcd599e9608938/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp#L1825-L1827 , setLocalStackSize
in https://github.com/llvm/llvm-project/blob/43b9888214234363e3468ffda5bcd599e9608938/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp#L1844 , and maybe others.
I don't know if any of those calls are important for the code of emitDefineCFAWithFP
, but if they are, and since both calls of emitDefineCFAWithFP
are part of emitPrologue
, why those variables were not passed as arguments, so their setup is exactly the same as it was in emitPrologue
? If one of the calls (the original change seems to add the second one) needs a cleaner AFI
, it can still generate one and pass it to the function.
@llvm/issue-subscribers-backend-aarch64
Author: Ellis Hoag (ellishg)
MF.getInfo<AArch64FunctionInfo>()
is a pointer to data associated with the function; it doesn't get recreated each time you call it.
The original change https://reviews.llvm.org/D153098 recreates a set of variables in
emitDefineCFAWithFP
: ... However, the variableAFI
inemitPrologue
was further customized with callssetRedZone
in
emitDefineCFAWithFP()
reads only CalleeSaveBaseToFrameRecordOffset
and CalleeSavedStackSize
which are not changed in emitPrologue()
.
The patch D153098 rearranges the instructions, as seen in https://github.com/llvm/llvm-project/issues/96950#issuecomment-2200689758, but I believe they are functionally equivalent. My guess for now is that the unwind information generator for MachO may not recognize the new pattern correctly. However, I can't find an example that generates an incorrect binary file for the patched version.
> cat test.ll
define i32 @foo(i32 %b, i32 %a) uwtable minsize {
%s = alloca i32
%1 = tail call i32 @bar(i32 %b)
%2 = tail call i32 @bar(i32 %a)
%3 = add i32 %a, %b
%4 = add i32 %3, %1
%5 = add i32 %4, %2
%6 = tail call i32 @bar(i32 %5)
ret i32 %6
}
declare i32 @bar(i32);
> llc-15ef9b26 test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -stop-after=prologepilog -o out-15ef9b26.mir
> llc-6e54fcce test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -stop-after=prologepilog -o out-6e54fcce.mir
> diff -u out-15ef9b26.mir out-6e54fcce.mir
--- out-15ef9b26.mir 2024-07-02 23:21:17.490322000 -0700
+++ out-6e54fcce.mir 2024-07-02 23:21:17.508311700 -0700
@@ -100,14 +100,14 @@
liveins: $w0, $w1, $lr, $x19, $x20, $x21, $x22
frame-setup HOM_Prolog $lr, $fp, $x19, $x20, $x21, $x22
+ $sp = frame-setup SUBXri $sp, 16, 0
+ frame-setup CFI_INSTRUCTION def_cfa_offset 64
frame-setup CFI_INSTRUCTION offset $w30, -8
frame-setup CFI_INSTRUCTION offset $w29, -16
frame-setup CFI_INSTRUCTION offset $w19, -24
frame-setup CFI_INSTRUCTION offset $w20, -32
frame-setup CFI_INSTRUCTION offset $w21, -40
frame-setup CFI_INSTRUCTION offset $w22, -48
- $sp = frame-setup SUBXri $sp, 16, 0
- frame-setup CFI_INSTRUCTION def_cfa_offset 64
renamable $w19 = COPY $w1
renamable $w20 = COPY $w0
BL @bar, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
> llc-15ef9b26 test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -filetype=obj -o out-15ef9b26.o
> llc-6e54fcce test.ll -mtriple=arm64-applie-ios13.4.0 -homogeneous-prolog-epilog -filetype=obj -o out-6e54fcce.o
> llvm-objdump -u out-15ef9b26.o > out-15ef9b26.unwind.dump
> llvm-objdump -u out-6e54fcce.o > out-6e54fcce.unwind.dump
> diff -u out-15ef9b26.unwind.dump out-6e54fcce.unwind.dump
--- out-15ef9b26.unwind.dump 2024-07-02 23:30:18.368070400 -0700
+++ out-6e54fcce.unwind.dump 2024-07-02 23:30:18.376070600 -0700
@@ -1,5 +1,5 @@
-out-15ef9b26.o: file format mach-o arm64
+out-6e54fcce.o: file format mach-o arm64
Unwind info:
Contents of __compact_unwind section:
@ellishg, can you check that for the same input, revision 15ef9b26, which is immediately before the patch, generates a non-zero encoding for the suspected function, while 6e54fcce emits zero? The difference should be visible after the compilation, you don't need to link the whole application. After that, maybe you will be able to reduce the sample code to keep this difference. The goal is to have some source code you can share, which results in different values in the __compact_unwind
section.
We have been investigating a bug in an iOS app where a function is failing to catch an exception, leading to a crash. The related functions had their prologs outlined using
-homogeneous-prolog-epilog
, and disabling that optimization fixed the crash. We also found that reverting https://github.com/llvm/llvm-project/commit/6e54fccede402c9ed0e8038aa258a99c5a2773e5 (https://reviews.llvm.org/D153098) also fixed crash when the optimization is enabled.Since I don't know a lot about CFI instructions, I don't have a small reproducer, but I am happy to provide IR for a few functions if that helps.
@igorkudrin @MaskRay @smeenai Do any of you have some idea of what's going on? Can this diff be safely reverted? Or can we fix somehow?