Open prestontimmons opened 1 year ago
I can't reproduce this segfault inside an arm64 docker container on an x86_64 host, so this seems to require a real machine and doesn't work under QEMU.
Linux 8ae19ce495c5 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Thus far I cannot reproduce this issue. Perhaps because I'm on
Linux alarm 5.19.8-1-aarch64-ARCH #1 SMP PREEMPT Thu Sep 8 18:20:33 MDT 2022 aarch64 GNU/Linux
Is the above output from a graviton2 instance?
Are you using mold
as your linker by any chance? Seems somewhat similar to https://github.com/rust-lang/rust/issues/101247.
Thanks for looking into this.
1) I also have not been able to reproduce this on x86_64 or in an emulated docker running on x86_64.
2) Yes, it is a graviton2 instance using 5.10.135-122.509.amzn2.aarch64
.
3) No, this is using the default linker. mold
has not been added.
I did some more testing and found an interesting result. When using cargo run
directly on the host the segmentation fault is not occurring, but I see it consistently in the docker runner that runs on the host (this is part of our CI). The docker image is based on rust:1.62-slim-bullseye
.
I'll dig deeper and find a more specific setup that reproduces it.
I met a similar issue on csky-arch when using println!
.
A small u128 will return the wrong result.
let a = 0_u128;
println!("{a}"); //14082568811966739713
let a = 1_u128;
println!("{a}"); //14082568811966739714
let a = 10_u128.pow(18);
println!("{a}"); //15082568811966739713
let a = 10_u128.pow(19);
println!("{a}"); //140825688119667397140000000421709631291
A large u128 will return the segmentation fault
let a = 2_u128.pow(84);
println!("{a}"); //segmentation fault
the calculation of u128 is correct and the other format type is correct
let a = 2_u128.pow(84) ;
println!("{:b}", a); //1000000000000000000000000000000000000000000000000000000000000000000000000000000000000
let a = (0_u128 + 1_u128 ) as u64;
println!("{:b}", a); //1
Actually, I'm working on migrating code to csky arch, which is a niche arch. I introduced the csky arch to rust by https://github.com/rust-lang/rust/pull/113658 and introduced it to libc by https://github.com/rust-lang/libc/pull/3301 .
I'm not sure what caused this issue. This issue is similar with yours. It confused me and I don't know if it is just my igorance in my PR or some error in any other code.
Nobody ever came up with a reproducer of the original report. I just spun up a few graviton instances and tried to again, and I couldn't reproduce the originally-reported crash.
I'm sure we could help out if you can come up with a reproducer that doesn't require owning some niche hardware. Is there an emulator people can run?
Failing in that, I'd try reporting this problem to your local expert on your arch. I strongly suspect that whatever is going on here is not too Rust-specific. This is probably an LLVM or linker problem, so anyone who can reproduce the problem and is experienced with low-level debugging could really help us out here by identifying what has gone wrong with the codegen. If this happens without optimizations, it's probably fairly localized. For example, if someone can point out "The instructions look good up until this one, at which point it makes no sense. The executable should contain these instructions instead."
I met a similar issue on csky-arch when using
println!
.
- A small u128 will return the wrong result.
let a = 0_u128; println!("{a}"); //14082568811966739713 let a = 1_u128; println!("{a}"); //14082568811966739714 let a = 10_u128.pow(18); println!("{a}"); //15082568811966739713 let a = 10_u128.pow(19); println!("{a}"); //140825688119667397140000000421709631291
- A large u128 will return the segmentation fault
let a = 2_u128.pow(84); println!("{a}"); //segmentation fault
- the calculation of u128 is correct and the other format type is correct
let a = 2_u128.pow(84) ; println!("{:b}", a); //1000000000000000000000000000000000000000000000000000000000000000000000000000000000000 let a = (0_u128 + 1_u128 ) as u64; println!("{:b}", a); //1
Actually, I'm working on migrating code to csky arch, which is a niche arch. I introduced the csky arch to rust by #113658 and introduced it to libc by rust-lang/libc#3301 .
I'm not sure what caused this issue. This issue is similar with yours. It confused me and I don't know if it is just my igorance in my PR or some error in any other code.
Fixed it by https://github.com/llvm/llvm-project/pull/69732 .
Triage: Relabeling issues which don't have a runnable reproduction (as opposed to having a non-minimized one) to the new label S-needs-repro. @rustbot label +S-needs-repro -E-needs-mcve
Hello, we've noticed segmentation faults when running Rust binaries compiled on aarch64 GNU/Linux. We've seen this occur in multiple libraries that format or print
SystemTime
.Architecture:
uname -a
Reproducible example:
The segmentation fault occurs when fmt_u128 is called.
I tested this on 1.62.0 and nightly:
The segmentation fault does not occur in release mode:
It also does not occur if opt-level is set to greater than 0:
It also does not occur on Darwin aarch64:
uname -a
Meta
Valgrind traceback:
Backtrace
``` # valgrind target/debug/scratch ==5157== Memcheck, a memory error detector ==5157== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==5157== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==5157== Command: target/debug/scratch ==5157== ==5157== Invalid read of size 4 ==5157== at 0x112474: alternate (mod.rs:1893) ==5157== by 0x112474: core::fmt::Formatter::pad_integral (mod.rs:1366) ==5157== by 0x111BBB: core::fmt::num::fmt_u128 (num.rs:641) ==5157== by 0x112347: core::fmt::write (mod.rs:1202) ==5157== by 0x15D5FB: write_fmt (mod.rs:1679)
==5157== by 0x15D5FB: <&std::io::stdio::Stdout as std::io::Write>::write_fmt (stdio.rs:715)
==5157== by 0x15E133: write_fmt (stdio.rs:689)
==5157== by 0x15E133: print_to (stdio.rs:1017)
==5157== by 0x15E133: std::io::stdio::_print (stdio.rs:1030)
==5157== by 0x10CDCB: scratch::main (main.rs:3)
==5157== by 0x10CEA3: core::ops::function::FnOnce::call_once (function.rs:251)
==5157== by 0x11B3AB: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:122)
==5157== by 0x17925F: std::rt::lang_start::{{closure}} (rt.rs:166)
==5157== by 0x15B38B: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:286)
==5157== by 0x15B38B: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:464)
==5157== by 0x15B38B: try + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:428)
==5157== by 0x15B38B: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:137)
==5157== by 0x15B38B: {closure#2} (rt.rs:148)
==5157== by 0x15B38B: do_call (panicking.rs:464)
==5157== by 0x15B38B: try (panicking.rs:428)
==5157== by 0x15B38B: catch_unwind (panic.rs:137)
==5157== by 0x15B38B: std::rt::lang_start_internal (rt.rs:148)
==5157== by 0x17922B: std::rt::lang_start (rt.rs:165)
==5157== by 0x10CE07: main (in /builds/scratch/target/debug/scratch)
==5157== Address 0x31 is not stack'd, malloc'd or (recently) free'd
==5157==
==5157==
==5157== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==5157== Access not within mapped region at address 0x57
==5157== at 0x112474: alternate (mod.rs:1893)
==5157== by 0x112474: core::fmt::Formatter::pad_integral (mod.rs:1366)
==5157== by 0x111BBB: core::fmt::num::fmt_u128 (num.rs:641)
==5157== by 0x112347: core::fmt::write (mod.rs:1202)
==5157== by 0x15D5FB: write_fmt (mod.rs:1679)
==5157== by 0x15D5FB: <&std::io::stdio::Stdout as std::io::Write>::write_fmt (stdio.rs:715)
==5157== by 0x15E133: write_fmt (stdio.rs:689)
==5157== by 0x15E133: print_to (stdio.rs:1017)
==5157== by 0x15E133: std::io::stdio::_print (stdio.rs:1030)
==5157== by 0x10CDCB: scratch::main (main.rs:3)
==5157== by 0x10CEA3: core::ops::function::FnOnce::call_once (function.rs:251)
==5157== by 0x11B3AB: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:122)
==5157== by 0x17925F: std::rt::lang_start::{{closure}} (rt.rs:166)
==5157== by 0x15B38B: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:286)
==5157== by 0x15B38B: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:464)
==5157== by 0x15B38B: try + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:428)
==5157== by 0x15B38B: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:137)
==5157== by 0x15B38B: {closure#2} (rt.rs:148)
==5157== by 0x15B38B: do_call (panicking.rs:464)
==5157== by 0x15B38B: try (panicking.rs:428)
==5157== by 0x15B38B: catch_unwind (panic.rs:137)
==5157== by 0x15B38B: std::rt::lang_start_internal (rt.rs:148)
==5157== by 0x17922B: std::rt::lang_start (rt.rs:165)
==5157== by 0x10CE07: main (in /builds/scratch/target/debug/scratch)
==5157== If you believe this happened as a result of a stack
==5157== overflow in your program's main thread (unlikely but
==5157== possible), you can try to increase the size of the
==5157== main thread stack using the --main-stacksize= flag.
==5157== The main thread stack size used in this run was 10485760.
==5157==
==5157== HEAP SUMMARY:
==5157== in use at exit: 1,109 bytes in 4 blocks
==5157== total heap usage: 9 allocs, 5 frees, 2,997 bytes allocated
==5157==
==5157== LEAK SUMMARY:
==5157== definitely lost: 0 bytes in 0 blocks
==5157== indirectly lost: 0 bytes in 0 blocks
==5157== possibly lost: 0 bytes in 0 blocks
==5157== still reachable: 1,109 bytes in 4 blocks
==5157== suppressed: 0 bytes in 0 blocks
==5157== Rerun with --leak-check=full to see details of leaked memory
==5157==
==5157== For lists of detected and suppressed errors, rerun with: -s
==5157== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
```