rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.88k stars 12.78k forks source link

random SIGSEGV on build #132334

Closed francesco-gaglione closed 2 days ago

francesco-gaglione commented 3 weeks ago

Code

https://github.com/francesco-gaglione/money_manager/tree/b2121d47702466c5a6b1b6a57dd6fb29351d2b82

Meta

rustc --version --verbose:

rustc 1.82.0 (f6e511eec 2024-10-15)
binary: rustc
commit-hash: f6e511eec7342f59a25f7c0534f1dbea00d01b14
commit-date: 2024-10-15
host: x86_64-unknown-linux-gnu
release: 1.82.0
LLVM version: 19.1.1

Error output

error: rustc interrupted by SIGSEGV, printing backtrace

/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(+0x33b6d83) [0x7f29b61b6d83]
/lib64/libc.so.6(+0x19dc0) [0x7f29b9b95dc0]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM.so.19.1-rust-1.82.0-stable(_ZN4llvm22RecomputeGlobalsAAPass3runERNS_6ModuleERNS_15AnalysisManagerIS1_JEEE+0x228) [0x7f29b15844ba]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM.so.19.1-rust-1.82.0-stable(+0x678428d) [0x7f29b158428d]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM.so.19.1-rust-1.82.0-stable(_ZN4llvm11PassManagerINS_6ModuleENS_15AnalysisManagerIS1_JEEEJEE3runERS1_RS3_+0x229) [0x7f29b1585ea9]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(LLVMRustOptimize+0x84c) [0x7f29b85b6098]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(+0x57b7f2f) [0x7f29b85b7f2f]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(+0x57b7a56) [0x7f29b85b7a56]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(_RNvXs1_Cs1oxnbOlTNyI_18rustc_codegen_llvmNtB5_18LlvmCodegenBackendNtNtNtCs6rvTwnzI2jg_17rustc_codegen_ssa6traits5write19WriteBackendMethods13optimize_thin+0x61d) [0x7f29b8374d33]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(+0x5879236) [0x7f29b8679236]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(+0x5878821) [0x7f29b8678821]
/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-2d2db9d3929202cf.so(+0x5877d2b) [0x7f29b8677d2b]
/lib64/libc.so.6(+0x70797) [0x7f29b9bec797]
/lib64/libc.so.6(+0xf478c) [0x7f29b9c7078c]

note: we would appreciate a report at https://github.com/rust-lang/rust
help: you can increase rustc's stack size by setting RUST_MIN_STACK=16777216
   Compiling accesskit_atspi_common v0.9.0 (https://github.com/wash2/accesskit?tag=iced-xdg-surface-0.13#95695534)
   Compiling smithay-client-toolkit v0.19.2
   Compiling atspi-connection v0.3.0
   Compiling cosmic-protocols v0.1.0 (https://github.com/pop-os/cosmic-protocols?rev=c8d3a1c#c8d3a1c3)
   Compiling atspi v0.19.0
   Compiling accesskit_unix v0.12.0 (https://github.com/wash2/accesskit?tag=iced-xdg-surface-0.13#95695534)
   Compiling smithay-clipboard v0.8.0 (https://github.com/pop-os/smithay-clipboard?tag=pop-dnd-5#5a3007de)
error: could not compile `wgpu-hal` (lib)

Caused by:
  process didn't exit successfully: `/home/kekko/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name wgpu_hal --edition=2021 /home/kekko/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-22.0.0/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --diagnostic-width=357 --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="dx12"' --cfg 'feature="gles"' --cfg 'feature="metal"' --cfg 'feature="renderdoc"' --cfg 'feature="vulkan"' --check-cfg 'cfg(docsrs)' --check-cfg 'cfg(feature, values("device_lost_panic", "dx12", "dxc_shader_compiler", "fragile-send-sync-non-atomic-wasm", "gles", "internal_error_panic", "metal", "oom_panic", "renderdoc", "vulkan", "windows_rs"))' -C metadata=1f25da354abb5b97 -C extra-filename=-1f25da354abb5b97 --out-dir /home/kekko/projects/personal/money_manager/target/release/deps -C strip=debuginfo -L dependency=/home/kekko/projects/personal/money_manager/target/release/deps --extern arrayvec=/home/kekko/projects/personal/money_manager/target/release/deps/libarrayvec-4b967a007dfda66e.rmeta --extern ash=/home/kekko/projects/personal/money_manager/target/release/deps/libash-070ee76552c53c5c.rmeta --extern bitflags=/home/kekko/projects/personal/money_manager/target/release/deps/libbitflags-991c06e3b4b8b37c.rmeta --extern glow=/home/kekko/projects/personal/money_manager/target/release/deps/libglow-111a8d7e356ecb5e.rmeta --extern gpu_alloc=/home/kekko/projects/personal/money_manager/target/release/deps/libgpu_alloc-c6196c5be1d75d97.rmeta --extern gpu_descriptor=/home/kekko/projects/personal/money_manager/target/release/deps/libgpu_descriptor-f63400af2e42d601.rmeta --extern khronos_egl=/home/kekko/projects/personal/money_manager/target/release/deps/libkhronos_egl-954f06aa7f126e79.rmeta --extern libc=/home/kekko/projects/personal/money_manager/target/release/deps/liblibc-c3273371a9c15a1d.rmeta --extern libloading=/home/kekko/projects/personal/money_manager/target/release/deps/liblibloading-81990520f7411faa.rmeta --extern log=/home/kekko/projects/personal/money_manager/target/release/deps/liblog-36a1ab59c74198f2.rmeta --extern naga=/home/kekko/projects/personal/money_manager/target/release/deps/libnaga-99097091cd405998.rmeta --extern once_cell=/home/kekko/projects/personal/money_manager/target/release/deps/libonce_cell-762fe6b706ce0b6f.rmeta --extern parking_lot=/home/kekko/projects/personal/money_manager/target/release/deps/libparking_lot-84672c4c4c32eca9.rmeta --extern profiling=/home/kekko/projects/personal/money_manager/target/release/deps/libprofiling-4ed26ed941d9f6b8.rmeta --extern raw_window_handle=/home/kekko/projects/personal/money_manager/target/release/deps/libraw_window_handle-06bdcc0542324944.rmeta --extern renderdoc_sys=/home/kekko/projects/personal/money_manager/target/release/deps/librenderdoc_sys-47ed154b4921fb63.rmeta --extern rustc_hash=/home/kekko/projects/personal/money_manager/target/release/deps/librustc_hash-f53025c7e99ea139.rmeta --extern smallvec=/home/kekko/projects/personal/money_manager/target/release/deps/libsmallvec-00336ba1c0dcc13f.rmeta --extern thiserror=/home/kekko/projects/personal/money_manager/target/release/deps/libthiserror-a8ca2c7d239a232a.rmeta --extern wgt=/home/kekko/projects/personal/money_manager/target/release/deps/libwgpu_types-804044821598ae1a.rmeta --cap-lints allow --cfg native --cfg send_sync --cfg gles --cfg vulkan` (signal: 11, SIGSEGV: invalid memory reference)
warning: build failed, waiting for other jobs to finish...
error: Recipe `run` failed on line 65 with exit code 101
Backtrace

``` ```

saethlin commented 3 weeks ago

@francesco-gaglione Why did you say this is "random"? Does the build sometimes succeed?

francesco-gaglione commented 3 weeks ago

@saethlin nope, I mean it generate this error on a random dependecy build. Just a note if could help... I tryed to clean everything, cache, build and everything that I can clear, the build passes, than build again, not pass... I repeted the clean but it do not pass, it is random, I can say it work onece on 50 attempt. Just last note, yesterday rust-analyzer causes some crash because It ate all 64gb of ram on my pc. Today I did a dependency update and it no longer eats all the ram but the problem remains that it does not build

saethlin commented 3 weeks ago

So it crashes on a different dependency each time?

Or do you always get

error: could not compile wgpu-hal (lib)

francesco-gaglione commented 3 weeks ago

@saethlin yes correct. It crash on a different deps each time.

francesco-gaglione commented 3 weeks ago

I think u can easly reproduce it, clone the repo and run just run if u are Lucky and work, run it again and it will never work again. Ah very important update, I tried to run the build a lot of times, 20 times more or less and it worked. I don't want to say wrong things, but it's a bit like a buffer, it fills the buffer and breaks, you start again and start from where it stopped until it fills up again. So if you try endlessly sooner or later it will end. In fact if I try to do a flatpak build it doesn't work even after 50 times because it always starts from 0

saethlin commented 3 weeks ago

I've done cargo clean then just run a handful of times and no segfaults. just run pulls up an interactive app which I don't want to figure out how to kill from a script, so I'm just going to cargo clean then cargo build --release in a script a hundred times.

But at this point I do not think I am going to find a crash. Since you're getting different sporadic and different segfaults, I suspect this is defective hardware, not a compiler bug. You can find all the other issues we root-caused to defective hardware by searching for the label: https://github.com/rust-lang/rust/issues?q=is%3Aissue%20state%3Aclosed%20label%3AC-defective-hardware

What CPU model do you have?

francesco-gaglione commented 3 weeks ago

Mmmm strange, last week it worked than the problem started when I started to configure flatpak. Anyway I have a ryzen 9 7950x.

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   32
  On-line CPU(s) list:    0-31
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen 9 7950X 16-Core Processor
    CPU family:           25
    Model:                97
    Thread(s) per core:   2
    Core(s) per socket:   16
    Socket(s):            1
    Stepping:             2
    Frequency boost:      enabled
    CPU(s) scaling MHz:   42%
    CPU max MHz:          5881,0000
    CPU min MHz:          400,0000
    BogoMIPS:             8982,68
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4
                          _2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vm
                          mcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru w
                          bnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_rec
                          ov succor smca fsrm flush_l1d
Virtualization features:
  Virtualization:         AMD-V
Caches (sum of all):
  L1d:                    512 KiB (16 instances)
  L1i:                    512 KiB (16 instances)
  L2:                     16 MiB (16 instances)
  L3:                     64 MiB (2 instances)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-31
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Vulnerable: Safe RET, no microcode
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected
francesco-gaglione commented 3 weeks ago

I tried switching to rustup 1.79 which used to work without any problems and even this version now does not work. It seems I can't do rust builds anymore, it's really limiting I can't work on any projects. If it is a hardware support issue is there a walkaround?

DianQK commented 3 weeks ago

I'm using 7950X, but I cannot reproduce this. This issue often occurs due to mismatched libc version updates when I am using NixOS. This might help you.

francesco-gaglione commented 3 weeks ago

@DianQK I used fedora 40 and fedora 41, now I'm trying pop os, but same error on all OSs. Tips on how to check and solve libc version?

bjorn3 commented 3 weeks ago

Can you try running memtest86+ to check for bad ram?

DianQK commented 3 weeks ago

@DianQK I used fedora 40 and fedora 41, now I'm trying pop os, but same error on all OSs. Tips on how to check and solve libc version?

I don't know. :(

francesco-gaglione commented 3 weeks ago

@bjorn3 checked and all ok apart from a small report. I tried removing all 4 banks of ram and reinstalling them in different locations and magically everything seems to be working now... I'll update you but everything seems to be working. It's a kind of magic

francesco-gaglione commented 3 weeks ago

went back to fedora and issue there again... I realy dk what happening here

francesco-gaglione commented 3 weeks ago
sudo memtester 1024 5

memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 1024MB (1073741824 bytes)
got  1024MB (1073741824 bytes), trying mlock ...locked.
Loop 1/5:
  Stuck Address       : ok         
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok         
  Block Sequential    : testing   2FAILURE: 0x202020202020202 != 0x212020202020202 at offset 0x1d6f9b08.
  Checkerboard        : ok         
  Bit Spread          : ok         
  Bit Flip            : testing 188FAILURE: 0x00800000 != 0x10000000800000 at offset 0x1b207f88.
  Walking Ones        : testing   8FAILURE: 0xffeffffffffffeff != 0xfffffffffffffeff at offset 0x17536bd0.
  Walking Zeroes      : testing   0FAILURE: 0x00000001 != 0x10000000000001 at offset 0x0a0fbd88.
  8-bit Writes        : ok
  16-bit Writes       : ok

Loop 2/5:
  Stuck Address       : ok         
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
FAILURE: 0x00000001 != 0x10000000000001 at offset 0x1c54c1c8.
  Compare DIV         : FAILURE: 0x78a54b3673ff813d != 0x78b54b3673ff813d at offset 0x1c54c1c8.
  Compare OR          :   Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok         
  Block Sequential    : testing   2FAILURE: 0x212020202020202 != 0x202020202020202 at offset 0x0fe8b710.
  Checkerboard        : ok         
  Bit Spread          : ok         
  Bit Flip            : ok         
  Walking Ones        : testing   2FAILURE: 0xfffffffffffffffb != 0xffeffffffffffffb at offset 0x1f642108.
  Walking Zeroes      : testing   5FAILURE: 0x10000000000020 != 0x00000020 at offset 0x19e08110.
  8-bit Writes        : ok
  16-bit Writes       : -FAILURE: 0x65fb07a07f5f5272 != 0x65eb07a07f5f5272 at offset 0x1c220748.

Loop 3/5:
  Stuck Address       : ok         
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok         
  Block Sequential    : testing  11FAILURE: 0xb1b0b0b0b0b0b0b != 0xb0b0b0b0b0b0b0b at offset 0x0c4dc590.
  Checkerboard        : ok         
  Bit Spread          : ok         
  Bit Flip            : ok         
  Walking Ones        : testing   8FAILURE: 0xffeffffffffffeff != 0xfffffffffffffeff at offset 0x1ae083d0.
  Walking Zeroes      : testing   5FAILURE: 0x10000000000020 != 0x00000020 at offset 0x02a1e3d0.
  8-bit Writes        : -FAILURE: 0xfe4dd7dbcbf37038 != 0xfe5dd7dbcbf37038 at offset 0x17511550.
  16-bit Writes       : ok
bjorn3 commented 3 weeks ago

The problem is almost certainly caused by defective ram given the memtest result you showed. As for why it seemed like it was fixed, it is possible that the ram only misbehaves when it is for example above a certain temperature. When reseating your ram it may have had a chance to cool down just enough to work for a bit.

skibon02 commented 2 days ago

I can confirm another instance of such behaviour. It was also unpredictable, getting SEGFAULT and sometimes SIGILL on random dependencies during build. I tried switching to windows and it was working fine, all 5-10 clean rebuilds finished successfully. Then, i ran memtest a few times which did not show any issues with RAM.

Finally, i've switched back to linux, and it was no longer failing to build project. It was about 2 weeks ago on a stable rust version.

Some observations: 1) non-clean rebuild was almost never failing. 2) clean rebuild was wailing with probability about 90% 3) some unexpected crashes were frequently happening on this hardware, mostly in games while GPU is loaded.

CPU: Intel i5-12400F

saethlin commented 2 days ago

I'm closing this because I believe the problem reported here has been diagnosed as defective hardware, based on the memtest failure above.