rust-lang / stdarch

Rust's standard library vendor-specific APIs and run-time feature detection
https://doc.rust-lang.org/stable/core/arch/
Apache License 2.0
605 stars 267 forks source link

rustc crashes when trying to bench upcoming neon-support in RustFFT with latest stdarch. #1227

Closed HEnquist closed 9 months ago

HEnquist commented 3 years ago

I'm working on adding neon support to RustFFT, and wanted to try the vld* and vst* instrinsics added here: https://github.com/rust-lang/stdarch/pull/1224

First results were promising, but now I'm having a hard time running benchmarks because rustc crashes when building the benches. It crashes quite hard, without giving any useful error message. I'm using rust commit d14731c (simply master from yesterday, have also tried with a version from a couple of days ago with the same result), with stdarch updated to commit 931cdfb.

I would like to investigate this and try to at least help solve it, but I have no idea were to start. Any advice?

I'm trying to bench this branch: https://github.com/HEnquist/RustFFT/tree/vldx

I have tried on both a raspberry pi, and on an Oracle Ampere VM, with the same results.

Error:

pi@raspberrypi:~/RustFFT $ cargo bench --features neon neon_
   Compiling rustfft v6.0.1 (/home/pi/RustFFT)
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x826e48)[0x7f919b2e48]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x7f96f1a788]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x11b83a4)[0x7f923443a4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x16b1810)[0x7f9283d810]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e1f4)[0x7f935fa1f4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e350)[0x7f935fa350]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246f570)[0x7f935fb570]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb5d674)[0x7f91ce9674]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb48ba4)[0x7f91cd4ba4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb4cfe8)[0x7f91cd8fe8]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa24380)[0x7f91bb0380]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa1eb14)[0x7f91baab14]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa95048)[0x7f91c21048]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xaf9320)[0x7f91c85320]
/mount/ssd/rustc/lib/libstd-ba7780e7efc39bcb.so(rust_metadata_std_2d103c436cd22770+0x8c380)[0x7f9108b380]
/lib/aarch64-linux-gnu/libpthread.so.0(+0x77e4)[0x7f90e3d7e4]
/lib/aarch64-linux-gnu/libc.so.6(+0xcfadc)[0x7f90f48adc]
error: could not compile `rustfft`

Caused by:
  process didn't exit successfully: `rustc --crate-name rustfft --edition=2018 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --emit=dep-info,link -C opt-level=3 -C embed-bitcode=no --test --cfg 'feature="avx"' --cfg 'feature="default"' --cfg 'feature="neon"' --cfg 'feature="sse"' -C metadata=541f3979c27e3f0a -C extra-filename=-541f3979c27e3f0a --out-dir /home/pi/RustFFT/target/release/deps -L dependency=/home/pi/RustFFT/target/release/deps --extern num_complex=/home/pi/RustFFT/target/release/deps/libnum_complex-d7ededc7dd339a27.rlib --extern num_integer=/home/pi/RustFFT/target/release/deps/libnum_integer-0edf0ad8b3f42ac1.rlib --extern num_traits=/home/pi/RustFFT/target/release/deps/libnum_traits-7542682cf91f65c6.rlib --extern paste=/home/pi/RustFFT/target/release/deps/libpaste-49b243a423c645cd.so --extern primal_check=/home/pi/RustFFT/target/release/deps/libprimal_check-d5a43e363432ab49.rlib --extern rand=/home/pi/RustFFT/target/release/deps/librand-5cd04db47872812e.rlib --extern strength_reduce=/home/pi/RustFFT/target/release/deps/libstrength_reduce-1c2f1a65415e918e.rlib --extern transpose=/home/pi/RustFFT/target/release/deps/libtranspose-226cfc662b715315.rlib` (signal: 11, SIGSEGV: invalid memory reference)
warning: build failed, waiting for other jobs to finish...
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x826e48)[0x7f7f990e48]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x7f84ef8788]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x11b83a4)[0x7f803223a4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x16b1810)[0x7f8081b810]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e1f4)[0x7f815d81f4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e350)[0x7f815d8350]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246f570)[0x7f815d9570]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb5d674)[0x7f7fcc7674]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb48ba4)[0x7f7fcb2ba4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb4cfe8)[0x7f7fcb6fe8]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa24380)[0x7f7fb8e380]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa1eb14)[0x7f7fb88b14]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa95048)[0x7f7fbff048]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xaf9320)[0x7f7fc63320]
/mount/ssd/rustc/lib/libstd-ba7780e7efc39bcb.so(rust_metadata_std_2d103c436cd22770+0x8c380)[0x7f7f069380]
/lib/aarch64-linux-gnu/libpthread.so.0(+0x77e4)[0x7f7ee1b7e4]
/lib/aarch64-linux-gnu/libc.so.6(+0xcfadc)[0x7f7ef26adc]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x826e48)[0x7fafd07e48]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x7fb526f788]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x11b83a4)[0x7fb06993a4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x16b1810)[0x7fb0b92810]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e1f4)[0x7fb194f1f4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e350)[0x7fb194f350]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246f570)[0x7fb1950570]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb5d674)[0x7fb003e674]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb48ba4)[0x7fb0029ba4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb4cfe8)[0x7fb002dfe8]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa24380)[0x7faff05380]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa1eb14)[0x7fafeffb14]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa95048)[0x7faff76048]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xaf9320)[0x7faffda320]
/mount/ssd/rustc/lib/libstd-ba7780e7efc39bcb.so(rust_metadata_std_2d103c436cd22770+0x8c380)[0x7faf3e0380]
/lib/aarch64-linux-gnu/libpthread.so.0(+0x77e4)[0x7faf1927e4]
/lib/aarch64-linux-gnu/libc.so.6(+0xcfadc)[0x7faf29dadc]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x826e48)[0x7faa5c3e48]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x7fafb2b788]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x11b83a4)[0x7faaf553a4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x16b1810)[0x7fab44e810]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e1f4)[0x7fac20b1f4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246e350)[0x7fac20b350]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0x246f570)[0x7fac20c570]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb5d674)[0x7faa8fa674]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb48ba4)[0x7faa8e5ba4]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xb4cfe8)[0x7faa8e9fe8]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa24380)[0x7faa7c1380]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa1eb14)[0x7faa7bbb14]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xa95048)[0x7faa832048]
/mount/ssd/rustc/lib/librustc_driver-b200c9ef89c7aa6a.so(+0xaf9320)[0x7faa896320]
/mount/ssd/rustc/lib/libstd-ba7780e7efc39bcb.so(rust_metadata_std_2d103c436cd22770+0x8c380)[0x7fa9c9c380]
/lib/aarch64-linux-gnu/libpthread.so.0(+0x77e4)[0x7fa9a4e7e4]
/lib/aarch64-linux-gnu/libc.so.6(+0xcfadc)[0x7fa9b59adc]
error: build failed
workingjubilee commented 3 years ago

Please include rustc --version --verbose.

HEnquist commented 3 years ago

Sure! It doesn't say that much unfortunately.

pi@raspberrypi:~ $ /mount/ssd/rustc/bin/rustc --version --verbose
rustc 1.57.0-dev
binary: rustc
commit-hash: unknown
commit-date: unknown
host: aarch64-unknown-linux-gnu
release: 1.57.0-dev
LLVM version: 13.0.0
SparrowLii commented 3 years ago

Can you find out which lines of code in the bench caused the crash? This might help to find the root cause.

workingjubilee commented 3 years ago

LLVM version: 13.0.0

This is what I wanted to make sure of. The last "official" LLVM 13.0.0 (Rust was merging the "release candidate" versions to be able to get a head start on testing) was pulled in shortly after you posted this, so it may be a good idea to try today's rustc.

HEnquist commented 3 years ago

It doesn't seem to matter much what my benches contain, it fails no matter what. I just started building rustc from today, will try it as soon as it's ready (tomorrow probably, takes some time on a Raspberry Pi..)

workingjubilee commented 3 years ago

aarch64-unknown-linux-gnu is a tier 1 target: It should be possible to download the latest nightly via rustup, no? No need to recompile it.

HEnquist commented 3 years ago

I need a newer stdarch than in the latest nightly, with all the vld and vst intrinsics.

HEnquist commented 3 years ago

The updated llvm unfortunately made no difference. If I go back to the RustFFT version just before I started using the vld and vst intrinsics builds and benches fine. I'll try to figure out exactly what change triggers the crash. Unfortunately I'm a bit short on time these days, so may take a while.

HEnquist commented 3 years ago

The crash seems to come when I compile a benchmark if my FFTs use this function: https://github.com/HEnquist/RustFFT/blob/vldx/src/neon/neon_vector.rs#L156 It only fails with cargo bench, with cargo test it's all good.

SparrowLii commented 3 years ago

We can replace vld2q_f64 with a fn with equivalent behavior and see if the crash will still happen:

pub unsafe fn vld2q_f64_fake(a: *const f64) -> float64x2x2_t {
    let x: [f64; 4] = core::ptr::read_unaligned(a.cast());
    transmute([x[0], x[2], x[1], x[3]])
}
HEnquist commented 3 years ago

Using the vld2q_f64_fake instead of vld2q_f64 makes the benches build and run fine!

HEnquist commented 3 years ago

By the way, vld3q_f64 and vld4q_f64 cause no problems. No need for fake-versions of those to make the benches ok.

SparrowLii commented 3 years ago

That is interesting. I think vld2q_f64 may have special requirements for align. This requires specific analysis of llvm's implementation of vld2. Unfortunately I am not good at this part.

workingjubilee commented 3 years ago

Can you show the assembly emitted for each of those intrinsics, as it looks like in the final bench binary, @HEnquist? This will likely require a disassembly tool rather than relying on --emit=asm or anything. it also likely requires surrounding context in terms of assembly, hopefully not everything, just each bench test.

hkratz commented 3 years ago

I have looked at this a bit and can reproduce this on a Mac M1.

Rustc crashes in LLVM codegen:

Process 26362 stopped
* thread #7, name = 'LTO bench_rustfft_neon.f5e027c6-cgu.1', stop reason = EXC_BAD_ACCESS (code=1, address=0x400010a125b10)
    frame #0: 0x0000000100530534 librustc_driver-69ff7149a4f34321.dylib`llvm::CallInst::Create(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, llvm::Twine const&, llvm::Instruction*) + 320
librustc_driver-69ff7149a4f34321.dylib`llvm::CallInst::Create:
->  0x100530534 <+320>: ldr    x8, [x25, #0x10]
    0x100530538 <+324>: ldr    x1, [x8]
    0x10053053c <+328>: cbz    x20, 0x100530560          ; <+364>
    0x100530540 <+332>: mov    w8, #0x30

Full backtrace:

(lldb) bt
* thread #7, name = 'LTO bench_rustfft_neon.f5e027c6-cgu.1', stop reason = EXC_BAD_ACCESS (code=1, address=0x400010a125b10)
  * frame #0: 0x0000000100530534 librustc_driver-69ff7149a4f34321.dylib`llvm::CallInst::Create(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, llvm::Twine const&, llvm::Instruction*) + 320
    frame #1: 0x0000000100530294 librustc_driver-69ff7149a4f34321.dylib`llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::Twine const&, llvm::MDNode*) + 80
    frame #2: 0x0000000101186b50 librustc_driver-69ff7149a4f34321.dylib`llvm::AArch64TargetLowering::lowerInterleavedLoad(llvm::LoadInst*, llvm::ArrayRef<llvm::ShuffleVectorInst*>, llvm::ArrayRef<unsigned int>, unsigned int) const + 852
    frame #3: 0x0000000101538c6c librustc_driver-69ff7149a4f34321.dylib`(anonymous namespace)::InterleavedAccess::runOnFunction(llvm::Function&) + 4868
    frame #4: 0x0000000101f42030 librustc_driver-69ff7149a4f34321.dylib`llvm::FPPassManager::runOnFunction(llvm::Function&) + 672
    frame #5: 0x0000000101f477c0 librustc_driver-69ff7149a4f34321.dylib`llvm::FPPassManager::runOnModule(llvm::Module&) + 52
    frame #6: 0x0000000101f42528 librustc_driver-69ff7149a4f34321.dylib`llvm::legacy::PassManagerImpl::run(llvm::Module&) + 856
    frame #7: 0x00000001003d8a40 librustc_driver-69ff7149a4f34321.dylib`LLVMRustWriteOutputFile + 692
    frame #8: 0x00000001002e16a4 librustc_driver-69ff7149a4f34321.dylib`rustc_codegen_llvm::back::write::write_output_file::h8c4897ade22bc53c + 204
    frame #9: 0x000000010034fff0 librustc_driver-69ff7149a4f34321.dylib`rustc_codegen_llvm::back::write::codegen::with_codegen::ha82c7a362395cd34 + 116
    frame #10: 0x00000001002e4bd4 librustc_driver-69ff7149a4f34321.dylib`rustc_codegen_llvm::back::write::codegen::h8d756782e432dc6c + 2524
    frame #11: 0x00000001003140e0 librustc_driver-69ff7149a4f34321.dylib`rustc_codegen_ssa::back::write::finish_intra_module_work::h079cbdb2f84c889e + 184
    frame #12: 0x000000010030f890 librustc_driver-69ff7149a4f34321.dylib`rustc_codegen_ssa::back::write::execute_work_item::hfb8dd85525a92ee7 + 780
    frame #13: 0x00000001003bf5f4 librustc_driver-69ff7149a4f34321.dylib`std::sys_common::backtrace::__rust_begin_short_backtrace::h086be9b8ac7cc110 + 176
    frame #14: 0x000000010032eea0 librustc_driver-69ff7149a4f34321.dylib`std::panicking::try::hb23e946ef2c82654 + 52
    frame #15: 0x000000010039114c librustc_driver-69ff7149a4f34321.dylib`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h7660a4fef4f4ca66 + 128
    frame #16: 0x0000000107747fb0 libstd-5be8030cf9a973ad.dylib`_$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h6f4298f91d78694f + 36
    frame #17: 0x000000010775b14c libstd-5be8030cf9a973ad.dylib`std::sys::unix::thread::Thread::new::thread_start::h947b820fbfb10caa + 36
    frame #18: 0x00000001884a7878 libsystem_pthread.dylib`_pthread_start + 320

Trying to narrow it down:

  1. It can also be reproduced by trying to build the tests with cargo +stage1 build --features neon --release --tests
  2. It also happens trying to build the asmtest.rs example with that switched to neon.
  3. If the modified asmtest.rs is added to a module in the library sources itself forcing monomorphization building the library in release mode fails as well.

With --emit=llvm-ir I can already get a file which reproduces the LLVM codegen crash with llc but it is too big and it might still be bad IR that rustc emits. I will look into it further when I have time.

HEnquist commented 2 years ago

I didn't have any time to continue on this (and I think that I probably know too little about this stuff to be useful anyway). Did anyone else make any progress?

Amanieu commented 2 years ago

Rust recently upgraded to LLVM 14, can you try this on the latest nightly to see if it is still an issue?

HEnquist commented 2 years ago

I'll try asap and report back!

HEnquist commented 2 years ago

Rust recently upgraded to LLVM 14, can you try this on the latest nightly to see if it is still an issue?

I just tried this, and unfortunately the newer LLVM doesn't seem to make any difference.

Nugine commented 1 year ago

I think rustc generates correct instructions.

https://developer.arm.com/architectures/instruction-sets/intrinsics/vld2q_f64

use core::arch::aarch64::*;

#[inline(never)]
pub unsafe fn vld2q_f64_real(p: *const f64) -> float64x2x2_t {
    vld2q_f64(p)
}

#[inline(never)]
pub unsafe fn vld2q_f64_fake(a: *const f64) -> float64x2x2_t {
    let x: [float64x1_t; 4] = core::ptr::read_unaligned(a.cast());
    core::mem::transmute([x[0], x[2], x[1], x[3]])
}
example::vld2q_f64_real:
        ld2     { v0.2d, v1.2d }, [x0]
        stp     q0, q1, [x8]
        ret

example::vld2q_f64_fake:
        ldp     d0, d2, [x0]
        ldp     d1, d3, [x0, #16]
        str     d0, [x8]
        str     d2, [x8, #16]
        str     d1, [x8, #8]
        str     d3, [x8, #24]
        ret

It may be related with a recent issue. The latest nightly has upgraded to LLVM 15.0.4.

HEnquist commented 9 months ago

I just got back to this after a small break :) Things are working just fine on recent rustc versions.