rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
94.7k stars 12.2k forks source link

`RUST_BACKTRACE=full` loop with `-Cpanic=abort` on `aarch64-unknown-linux-gnu` #123733

Open cuviper opened 2 months ago

cuviper commented 2 months ago

When compiled with -Cpanic=abort, a program as simple as fn main() { panic!(); } gets into a loop with RUST_BACKTRACE=full on aarch64-unknown-linux-gnu:

$ echo 'fn main() { panic!() }' | rustc - -Cpanic=abort && RUST_BACKTRACE=full ./rust_out
thread 'main' panicked at <anon>:1:13:
explicit panic
stack backtrace:
   0:     0xaaaad44ee198 - std::backtrace_rs::backtrace::libunwind::trace::hfa92c98074c33c6e
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:     0xaaaad44ee198 - std::backtrace_rs::backtrace::trace_unsynchronized::h4fca41224fd18d13
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0xaaaad44ee198 - std::sys_common::backtrace::_print_fmt::hbb1869018263cc3e
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:68:5
   3:     0xaaaad44ee198 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hfa121140e91bdb74
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:44:22
   4:     0xaaaad45074b8 - core::fmt::rt::Argument::fmt::h0cc54cf87d70c86b
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/core/src/fmt/rt.rs:142:9
   5:     0xaaaad45074b8 - core::fmt::write::hb74970315812417b
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/core/src/fmt/mod.rs:1153:17
   6:     0xaaaad44ec398 - std::io::Write::write_fmt::h4028ee7f77ba6e46
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/io/mod.rs:1843:15
   7:     0xaaaad44edfe0 - std::sys_common::backtrace::_print::ha41833e087a85e12
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:47:5
   8:     0xaaaad44edfe0 - std::sys_common::backtrace::print::h32ae40bc3201cfc9
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:34:9
   9:     0xaaaad44ef1e8 - std::panicking::default_hook::{{closure}}::h46d22ee496490b96
  10:     0xaaaad44eee50 - std::panicking::default_hook::h8a05bff9b9ebb786
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/panicking.rs:291:9
  11:     0xaaaad44ef628 - std::panicking::rust_panic_with_hook::h72b4980bd319f2cc
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/panicking.rs:788:13
  12:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
  13:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
  14:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
  15:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
...

With RUST_BACKTRACE=1, it has no backtrace at all:

thread 'main' panicked at <anon>:1:13:
explicit panic
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted (core dumped)

Current toolchain:

rustc 1.79.0-nightly (8b2459c1f 2024-04-09)
binary: rustc
commit-hash: 8b2459c1f21187f9792d99310171a15e64feb9cf
commit-date: 2024-04-09
host: aarch64-unknown-linux-gnu
release: 1.79.0-nightly
LLVM version: 18.1.3

Using cargo bisect-rustc, this problem seems to be quite old:

********************************************************************************
Regression in nightly-2022-07-03
********************************************************************************

fetching https://static.rust-lang.org/dist/2022-07-02/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-07-02: 40 B / 40 B [============================================================] 100.00 % 229.04 KB/s converted 2022-07-02 to 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f
fetching https://static.rust-lang.org/dist/2022-07-03/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-07-03: 40 B / 40 B [==============================================================] 100.00 % 1.88 MB/s converted 2022-07-03 to f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3
looking for regression commit between 2022-07-02 and 2022-07-03
fetching (via remote github) commits from max(46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f, 2022-06-30) to f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3
ending github query because we found starting sha: 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f
get_commits_between returning commits, len: 8
  commit[0] 2022-07-01: Auto merge of #93967 - cjgillot:short-struct-span, r=petrochenkov
  commit[1] 2022-07-01: Auto merge of #98781 - GuillaumeGomez:rollup-798kb8u, r=GuillaumeGomez
  commit[2] 2022-07-02: Auto merge of #98791 - cuviper:rogue-binary, r=compiler-errors
  commit[3] 2022-07-02: Auto merge of #98802 - Dylan-DPC:rollup-u6mwx27, r=Dylan-DPC
  commit[4] 2022-07-02: Auto merge of #91743 - cjgillot:enable_mir_inlining_inline_all, r=oli-obk
  commit[5] 2022-07-02: Auto merge of #97235 - nbdd0121:unwind, r=Amanieu
  commit[6] 2022-07-02: Auto merge of #97585 - lqd:const-alloc-intern, r=RalfJung
  commit[7] 2022-07-02: Auto merge of #98820 - RalfJung:rollup-i3mip9a, r=RalfJung
ERROR: no CI builds available between 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f and f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3 within last 167 days

Comparison: https://github.com/rust-lang/rust/compare/46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f...f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3

97235 looks like the most obvious candidate, cc @nbdd0121 @Amanieu

The reason I'm noticing this now is because there's a new cargo test panic_abort_doc_tests from rust-lang/cargo#13388, which I found hanging on Fedora aarch64 in a scratch build of 1.78-beta. After about 2 hours, rustdoc got a SIGKILL, which I assume is an OOM from capturing all that backtrace output. When reproducing this hang, I found that running its RUST_BACKTRACE=full /tmp/rustdoctestGHjZyP/rust_out myself showed the backtrace loop.


This is similar to #123686, but that proposed fix only applies to Windows.

saethlin commented 2 months ago

I think this was already reported? https://github.com/rust-lang/rust/issues/123686

cuviper commented 2 months ago

@saethlin I mentioned that one at the end:

This is similar to #123686, but that proposed fix only applies to Windows.

and AFAICS that report doesn't depend on -Cpanic=abort either.

cuviper commented 2 months ago

It might be a duplicate of #121817, cc @wesleywiser

nbdd0121 commented 2 months ago

97235 adds a lint (which isn't even executed with panic=abort), and should have no codegen changes.

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

Adding -Cforce-frame-pointers=yes allows termination (still no traces beyond begin_panic, and adding -Cforce-unwind-tables=yes produces a property stack trace.

cuviper commented 2 months ago

97235 adds a lint (which isn't even executed with panic=abort), and should have no codegen changes.

OK, I don't know. The bisected nightly works fine when I build it myself, which implies that it's sensitive to the particular configuration, but that makes it hard to try and narrow this further. Or if the bug only arose there because of happenstance code changes, the root cause might not be in that range at all.

saethlin commented 2 months ago

I can reproduce the infinite backtrace with 2020-03-01 and the target is gone in 2020-01-01 so I suspect that this bug has been in the target since we started distributing artifacts for it.

I cannot reproduce the infinite backtrace printing with -Zbuild-std. I suspect the problem has something to do with the fact that the standard library is precompiled with -Cpanic=unwind then we're linking to it with -Cpanic=abort, and I don't know how to imitate that with -Zbuild-std.

The behavior of the MIR inliner is sensitive to the stable crate hash, so if the behavior of the MIR inliner can impact whether the fatal optimization happens, it may not be possible to bisect this.

saethlin commented 2 months ago

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

To be clear, the behavior seems wrong to me even using -Zbuild-std:

ubuntu@ip-172-31-13-98:~/demo$ RUSTFLAGS=-Cpanic=abort RUST_BACKTRACE=full cargo +nightly r -Zbuild-std=std,panic_abort --target=aarch64-unknown-linux-gnu
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/aarch64-unknown-linux-gnu/debug/demo`
thread 'main' panicked at src/main.rs:1:13:
explicit panic
stack backtrace:
   0:     0xaaaabefe12a8 - std::backtrace_rs::backtrace::libunwind::trace::h3eb3677c648e214e
                               at /home/ubuntu/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:     0xaaaabefe12a8 - std::backtrace_rs::backtrace::trace_unsynchronized::h6b663bf10ab98a13
                               at /home/ubuntu/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
Aborted (core dumped)

Buggy in a different way, but still buggy, so maybe it's not worth splitting hairs.

Amanieu commented 2 months ago

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

Shouldn't the unwinder just give up if it reaches a frame with no unwinding information? This seems to be the issue here since the loop start when the unwinder reaches the first frame compiled with panic=abort.

This target uses libgcc as the unwinder, this might be worth looking more deeply into.

A truncated backtrace is normal if everything is compiled with panic=abort since we don't emit eh_frame by default so the unwinder doesn't know how to unwind the stack. What shouldn't be happening is that the unwinder ends up in an infinite loop since that indicates invalid unwind information.

Also note that the call into panic is a non-returning tail call. This has caused issues in the past on the ARM target (#69231) since LLVM would assume the LR value is dead and leave it dangling. If this happens to hold a function pointer to std::panicking::begin_panic then this could lead to the infinite loop we are seeing.

workingjubilee commented 2 months ago

Huh? I thought AArch64 code was being compiled with frame pointers?

apiraino commented 2 months ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

nbdd0121 commented 2 months ago

Shouldn't the unwinder just give up if it reaches a frame with no unwinding information?

For backtracing, libgcc does have a mechanism to use a fallback mechanism if unwind info is not available. Although I just checked that it is implemented for PPC only, so this wasn't the case for aarch64.

This unfortunately makes me unable to explain why the looping behaviour disappears when we force frame pointer though.

Huh? I thought AArch64 code was being compiled with frame pointers?

Doesn't look like it. https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs

Apparently we have frame pointers enabled for non-leaf functions on Apple platforms, but not Linux.

Amanieu commented 2 months ago

Another interesting data point. When building for aarch64-unknown-linux-musl, I get this output:

stack backtrace:
   0:           0x419218 - std::backtrace_rs::backtrace::libunwind::trace::h41b2803078d25074
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:           0x419218 - std::backtrace_rs::backtrace::trace_unsynchronized::hb38469496ae402aa
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:           0x419218 - std::sys_common::backtrace::_print_fmt::h0cf53d01d01865f9
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:68:5
   3:           0x419218 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h2016a34b1ca538fd
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:44:22
   4:           0x44476c - core::fmt::rt::Argument::fmt::h96d6bfc4fa1065a9
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/core/src/fmt/rt.rs:142:9
   5:           0x44476c - core::fmt::write::h3a1e5be1e1cfe579
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/core/src/fmt/mod.rs:1153:17
   6:           0x4173e8 - std::io::Write::write_fmt::hcc53d7d63a73e855
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/io/mod.rs:1843:15
   7:           0x419024 - std::sys_common::backtrace::_print::hae3e089c5d3e39b2
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:47:5
   8:           0x419024 - std::sys_common::backtrace::print::h1182a252cb3edf9c
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:34:9
   9:           0x41a474 - std::panicking::default_hook::{{closure}}::he5c426f80ff73911
  10:           0x41a114 - std::panicking::default_hook::h5274ad032677b37f
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/panicking.rs:292:9
  11:           0x41a990 - std::panicking::rust_panic_with_hook::h45e7e3752affffd6
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/panicking.rs:779:13
  12:           0x4022c8 - std::panicking::begin_panic::{{closure}}::h275b05efe23611e5
  13:           0x430254 - _Unwind_Backtrace

Note that the last line is clearly incorrect: _Unwind_Backtrace does not call begin_panic.

Amanieu commented 2 months ago

More clues:

What should have happened is that we stop the backtrace at std::panicking::begin_panic::{{closure}} because it has no unwind info to continue further. But for some reason we attempt to unwind despite the lack of unwind info, which causes this problem. Is the backtrace unwinder automatically falling back to frame pointers?

eggyal commented 2 months ago

https://github.com/libunwind/libunwind/blob/05afdabf38d3fa461b7a9de80c64a6513a564d81/src/aarch64/Gstep.c#L635-L638

    /* Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
     * specifies that the end of the frame record chain is indicated by
     * the address zero in the address for the previous frame.
     */
Amanieu commented 2 months ago

OK I figured out the root cause of the problem. It's because LLVM is clobbering the link register on the tail call:

0000000000006924 <_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h3fd05c3e3e7919b0E>:
    6924:   d100c3ff    sub sp, sp, #0x30
    6928:   aa0003e8    mov x8, x0
    692c:   f940010a    ldr x10, [x8]
    6930:   f9400509    ldr x9, [x8, #8]
    6934:   f9000bea    str x10, [sp, #16]
    6938:   f9000fe9    str x9, [sp, #24]
    693c:   f9400bea    ldr x10, [sp, #16]
    6940:   f9400fe9    ldr x9, [sp, #24]
    6944:   910003e0    mov x0, sp
    6948:   f90003ea    str x10, [sp]
    694c:   f90007e9    str x9, [sp, #8]
    6950:   f9400903    ldr x3, [x8, #16]
    6954:   f00002a1    adrp    x1, 5d000 <GCC_except_table144+0x159b4>
    6958:   91248021    add x1, x1, #0x920
    695c:   aa1f03e2    mov x2, xzr
    6960:   52800028    mov w8, #0x1                    // #1
    6964:   12000104    and w4, w8, #0x1
    6968:   2a1f03e8    mov w8, wzr
    696c:   12000105    and w5, w8, #0x1
    6970:   940060d9    bl  1ecd4 <_ZN3std9panicking20rust_panic_with_hook17h06b7ef7eb6d8944bE>

Note how the incoming LR value is not saved anywhere and is later clobbered by the BL instruction.

Also, my earlier statement was incorrect, _ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h3fd05c3e3e7919b0E does have unwinding information:

00000088 0000000000000010 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 1
  Data alignment factor: -4
  Return address column: 30
  Augmentation data:     1b
  DW_CFA_def_cfa: r31 (sp) ofs 0

0000009c 0000000000000010 00000018 FDE cie=00000088 pc=0000000000006924..0000000000006974
  DW_CFA_advance_loc: 4 to 0000000000006928
  DW_CFA_def_cfa_offset: 48

This unwind info is claiming that the return address is present in the link register (this is the default if not overridden), but this is clearly incorrect in this case since the link register has been clobbered.

So the unwinder is behaving correctly, it's just that LLVM is emitting incorrect unwind info (or that it should be preserving the caller's link register).

Amanieu commented 2 months ago

Here is the LLVM IR for this function. I think the root of the problem is that we are assigning a personality function to this function, which forces LLVM to emit DWARF frame metadata for this function. However this is a nounwind function that only calls other nounwind functions, so LLVM clobbering the link register is perfectly valid.

I see 2 ways of fixing this:

; ModuleID = '<stdin>'
source_filename = "rust_out.7d368e479ffa2ca5-cgu.0"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-musl"

%"std::panicking::begin_panic::Payload<&str>" = type { %"core::option::Option<&str>" }
%"core::option::Option<&str>" = type { ptr, [1 x i64] }

@vtable.1 = external hidden unnamed_addr constant <{ ptr, [16 x i8], ptr, ptr }>, align 8

; Function Attrs: inlinehint noreturn nounwind
define hidden fastcc void @"_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h275b05efe23611e5E"(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %_1) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
  %_4 = alloca %"std::panicking::begin_panic::Payload<&str>", align 8
  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_4)
  %inner.0 = load ptr, ptr %_1, align 8, !nonnull !3, !align !4, !noundef !3
  %0 = getelementptr inbounds i8, ptr %_1, i64 8
  %inner.1 = load i64, ptr %0, align 8, !noundef !3
  store ptr %inner.0, ptr %_4, align 8
  %1 = getelementptr inbounds i8, ptr %_4, i64 8
  store i64 %inner.1, ptr %1, align 8
  %2 = getelementptr inbounds i8, ptr %_1, i64 16
  %_6 = load ptr, ptr %2, align 8, !nonnull !3, !align !5, !noundef !3
  call void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1 %_4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.1, ptr noalias noundef readonly align 8 dereferenceable_or_null(48) null, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) %_6, i1 noundef zeroext true, i1 noundef zeroext false) #4
  unreachable
}

; Function Attrs: nounwind
declare noundef i32 @rust_eh_personality(i32 noundef, i32 noundef, i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

; Function Attrs: noreturn nounwind
declare void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(24), ptr noalias noundef readonly align 8 dereferenceable_or_null(48), ptr noalias noundef readonly align 8 dereferenceable(24), i1 noundef zeroext, i1 noundef zeroext) unnamed_addr #2

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #3

attributes #0 = { inlinehint noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #1 = { nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #2 = { noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #4 = { noreturn nounwind }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 8, !"PIC Level", i32 2}
!1 = !{i32 7, !"PIE Level", i32 2}
!2 = !{!"rustc version 1.78.0-nightly (c67326b06 2024-03-15)"}
!3 = !{}
!4 = !{i64 1}
!5 = !{i64 8}
nbdd0121 commented 2 months ago

I suppose we can also just emit the uwtable attribute whenever we attach a personality function?

workingjubilee commented 2 months ago

Huh? I thought AArch64 code was being compiled with frame pointers?

Doesn't look like it. https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs

Apparently we have frame pointers enabled for non-leaf functions on Apple platforms, but not Linux.

Mmm. We probably ought to be adopting frame pointers across the board for AAPCS64 reasons. It's technically acceptable for a platform to deviate, but it should be explicit.

davidlt commented 1 month ago

I think this also affects riscv64 in Fedora/RISCV. Looking at the builds logs it seems to be panic_abort_doc_tests. It just runs forever slowly consuming the memory until it gets killed.

decathorpe commented 1 month ago

Huh? I thought AArch64 code was being compiled with frame pointers?

To clarify, Fedora does force-enable frame pointers for x86_64 and aarch64, so this might be affecting Rust builds on Fedora.

cuviper commented 1 month ago

I reproduced the problem using nightlies for this report, so I don't think Fedora's configuration matters.

Reflexe commented 1 month ago

@Amanieu I reproduced it. In my case, I saw that the LLVM IR output was:

; std::panicking::begin_panic::{{closure}}
; Function Attrs: inlinehint noreturn nounwind
define internal void @"_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h4691076880adeaeeE"(ptr align 8 %_1) unnamed_addr #5 personality ptr @rust_eh_personality {
start: 
  %0 = alloca [16 x i8], align 8
  %_7 = alloca [16 x i8], align 8
  %_4 = alloca [16 x i8], align 8
  %inner.0 = load ptr, ptr %_1, align 8
  %1 = getelementptr inbounds i8, ptr %_1, i64 8
  %inner.1 = load i64, ptr %1, align 8
  store ptr %inner.0, ptr %_7, align 8            
  %2 = getelementptr inbounds i8, ptr %_7, i64 8
  store i64 %inner.1, ptr %2, align 8
  %3 = load ptr, ptr %_7, align 8
  %4 = getelementptr inbounds i8, ptr %_7, i64 8
  %5 = load i64, ptr %4, align 8
  store ptr %3, ptr %_4, align 8
  %6 = getelementptr inbounds i8, ptr %_4, i64 8
  store i64 %5, ptr %6, align 8
  %7 = getelementptr inbounds i8, ptr %_1, i64 16
  %_6 = load ptr, ptr %7, align 8
; call std::panicking::rust_panic_with_hook
  call void @_ZN3std9panicking20rust_panic_with_hook17h9e3397264ef8c828E(ptr align 1 %_4, ptr align 8 @vtable.1, ptr align 8 null, ptr align 8 %_6, i1 zeroext true, i1 zeroext false) #13
  unreachable

bb1:                                              ; No predecessors!
  %8 = load ptr, ptr %0, align 8
  %9 = getelementptr inbounds i8, ptr %0, i64 8
  %10 = load i32, ptr %9, align 8
  %11 = insertvalue { ptr, i32 } poison, ptr %8, 0
  %12 = insertvalue { ptr, i32 } %11, i32 %10, 1
  resume { ptr, i32 } %12
}

Due to the resume (which for some reason does not exist in your code), we need to add a personality to this function. Otherwise, we would get an error from LLVM.

In any case, if I understand correctlly, the condition that sets the personality function in this case is: https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_ssa/src/mir/mod.rs#L182 which I don't fully understand TBH.

Also for everyone without access to a aarch64 machine and want to reproduce (paths are maybe a bit different):

echo 'fn main() { panic!() }' | rustc - -Cpanic=abort --target aarch64-unknown-linux-gnu  -C "linker=aarch64-linux-gnu-gcc" && qemu-aarch64  -E LD_LIBRARY_PATH="/usr/aarch64-linux-gnu/lib64" -E RUST_BACKTRACE=full -L /usr/aarch64-linux-gnu/   ./rust_out
Amanieu commented 1 month ago

@Reflexe Essentially, whenever we add a personality function we must ensure that we also add the uwtable attribute to the function, otherwise LLVM generates invalid unwinding information.

Reflexe commented 1 month ago

@Amanieu Thanks! this was pretty easy to implement, I created https://github.com/rust-lang/rust/pull/125844.

There are two issues still

  1. I don't fully understand why adding uwtable has any effect here: LLVM already generates an unwinding table entry for this function, is it correct to say that it generates an incorrect unwinding table entry?
  2. The issue you saw at the beginning still exists:
    > echo 'fn main() { panic!() }' | rustc - -Cpanic=abort --target aarch64-unknown-linux-gnu  -C "linker=aarch64-linux-gnu-gcc" && qemu-aarch64   -E LD_LIBRARY_PATH="/usr/aarch64-linux-gnu/lib64" -E RUST_BACKTRACE=full -L /usr/aarch64-linux-gnu/   ./rust_out
    thread 'main' panicked at <anon>:1:13:
    explicit panic
    stack backtrace:
    0:     0x7cd4114333c8 - trace
                               at /home/reflexe/rust/rust/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
    1:     0x7cd4114333c8 - trace_unsynchronized<std::sys_common::backtrace::_print_fmt::{closure_env#1}>
                               at /home/reflexe/rust/rust/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
    2:     0x7cd4114333c8 - _print_fmt
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:68:5
    3:     0x7cd4114333c8 - fmt
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:44:22
    4:     0x7cd411448b80 - fmt
                               at /home/reflexe/rust/rust/library/core/src/fmt/rt.rs:165:63
    5:     0x7cd411448b80 - write
                               at /home/reflexe/rust/rust/library/core/src/fmt/mod.rs:1168:21
    6:     0x7cd41141a65c - write_fmt<std::sys::pal::unix::stdio::Stderr>
                               at /home/reflexe/rust/rust/library/std/src/io/mod.rs:1835:15
    7:     0x7cd41143324c - _print
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:47:5
    8:     0x7cd41143324c - print
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:34:9
    9:     0x7cd41142e548 - {closure#1}
    10:     0x7cd41142e22c - default_hook
                               at /home/reflexe/rust/rust/library/std/src/panicking.rs:298:9
    11:     0x7cd41142e964 - rust_panic_with_hook
                               at /home/reflexe/rust/rust/library/std/src/panicking.rs:795:13
    12:     0x7cd4114123f0 - std::panicking::begin_panic::{{closure}}::h4691076880adeaee
    13:     0x7cd4114122f0 - std::sys_common::backtrace::__rust_end_short_backtrace::h8c00d8b5eac77590

    The last entry is incorrect here. Not sure if this is related or we should open another issue for this one. Thanks for the help.

Amanieu commented 3 weeks ago
  1. I don't fully understand why adding uwtable has any effect here: LLVM already generates an unwinding table entry for this function, is it correct to say that it generates an incorrect unwinding table entry?

Yes, it generates an incorrect entry if the uwtable attribute is absent. Specifically, it fails to preserve some registers while the unwinding tables indicate that they are actually preserved.

  1. The issue you saw at the beginning still exists:

No, the last entry here is correct: the backtrace properly ends at __rust_end_short_backtrace which comes from user code and therefore has no unwinding metadata.