Closed francesco-gaglione closed 2 days ago
@francesco-gaglione Why did you say this is "random"? Does the build sometimes succeed?
@saethlin nope, I mean it generate this error on a random dependecy build. Just a note if could help... I tryed to clean everything, cache, build and everything that I can clear, the build passes, than build again, not pass... I repeted the clean but it do not pass, it is random, I can say it work onece on 50 attempt. Just last note, yesterday rust-analyzer causes some crash because It ate all 64gb of ram on my pc. Today I did a dependency update and it no longer eats all the ram but the problem remains that it does not build
So it crashes on a different dependency each time?
Or do you always get
error: could not compile
wgpu-hal
(lib)
@saethlin yes correct. It crash on a different deps each time.
I think u can easly reproduce it, clone the repo and run just run
if u are Lucky and work, run it again and it will never work again.
Ah very important update, I tried to run the build a lot of times, 20 times more or less and it worked. I don't want to say wrong things, but it's a bit like a buffer, it fills the buffer and breaks, you start again and start from where it stopped until it fills up again. So if you try endlessly sooner or later it will end. In fact if I try to do a flatpak build it doesn't work even after 50 times because it always starts from 0
I've done cargo clean
then just run
a handful of times and no segfaults. just run
pulls up an interactive app which I don't want to figure out how to kill from a script, so I'm just going to cargo clean
then cargo build --release
in a script a hundred times.
But at this point I do not think I am going to find a crash. Since you're getting different sporadic and different segfaults, I suspect this is defective hardware, not a compiler bug. You can find all the other issues we root-caused to defective hardware by searching for the label: https://github.com/rust-lang/rust/issues?q=is%3Aissue%20state%3Aclosed%20label%3AC-defective-hardware
What CPU model do you have?
Mmmm strange, last week it worked than the problem started when I started to configure flatpak. Anyway I have a ryzen 9 7950x.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 7950X 16-Core Processor
CPU family: 25
Model: 97
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU(s) scaling MHz: 42%
CPU max MHz: 5881,0000
CPU min MHz: 400,0000
BogoMIPS: 8982,68
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4
_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vm
mcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru w
bnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_rec
ov succor smca fsrm flush_l1d
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 512 KiB (16 instances)
L1i: 512 KiB (16 instances)
L2: 16 MiB (16 instances)
L3: 64 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Vulnerable: Safe RET, no microcode
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
I tried switching to rustup 1.79 which used to work without any problems and even this version now does not work. It seems I can't do rust builds anymore, it's really limiting I can't work on any projects. If it is a hardware support issue is there a walkaround?
I'm using 7950X, but I cannot reproduce this. This issue often occurs due to mismatched libc version updates when I am using NixOS. This might help you.
@DianQK I used fedora 40 and fedora 41, now I'm trying pop os, but same error on all OSs. Tips on how to check and solve libc version?
Can you try running memtest86+ to check for bad ram?
@DianQK I used fedora 40 and fedora 41, now I'm trying pop os, but same error on all OSs. Tips on how to check and solve libc version?
I don't know. :(
@bjorn3 checked and all ok apart from a small report. I tried removing all 4 banks of ram and reinstalling them in different locations and magically everything seems to be working now... I'll update you but everything seems to be working. It's a kind of magic
went back to fedora and issue there again... I realy dk what happening here
sudo memtester 1024 5
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 1024MB (1073741824 bytes)
got 1024MB (1073741824 bytes), trying mlock ...locked.
Loop 1/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : testing 2FAILURE: 0x202020202020202 != 0x212020202020202 at offset 0x1d6f9b08.
Checkerboard : ok
Bit Spread : ok
Bit Flip : testing 188FAILURE: 0x00800000 != 0x10000000800000 at offset 0x1b207f88.
Walking Ones : testing 8FAILURE: 0xffeffffffffffeff != 0xfffffffffffffeff at offset 0x17536bd0.
Walking Zeroes : testing 0FAILURE: 0x00000001 != 0x10000000000001 at offset 0x0a0fbd88.
8-bit Writes : ok
16-bit Writes : ok
Loop 2/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
FAILURE: 0x00000001 != 0x10000000000001 at offset 0x1c54c1c8.
Compare DIV : FAILURE: 0x78a54b3673ff813d != 0x78b54b3673ff813d at offset 0x1c54c1c8.
Compare OR : Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : testing 2FAILURE: 0x212020202020202 != 0x202020202020202 at offset 0x0fe8b710.
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : testing 2FAILURE: 0xfffffffffffffffb != 0xffeffffffffffffb at offset 0x1f642108.
Walking Zeroes : testing 5FAILURE: 0x10000000000020 != 0x00000020 at offset 0x19e08110.
8-bit Writes : ok
16-bit Writes : -FAILURE: 0x65fb07a07f5f5272 != 0x65eb07a07f5f5272 at offset 0x1c220748.
Loop 3/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : testing 11FAILURE: 0xb1b0b0b0b0b0b0b != 0xb0b0b0b0b0b0b0b at offset 0x0c4dc590.
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : testing 8FAILURE: 0xffeffffffffffeff != 0xfffffffffffffeff at offset 0x1ae083d0.
Walking Zeroes : testing 5FAILURE: 0x10000000000020 != 0x00000020 at offset 0x02a1e3d0.
8-bit Writes : -FAILURE: 0xfe4dd7dbcbf37038 != 0xfe5dd7dbcbf37038 at offset 0x17511550.
16-bit Writes : ok
The problem is almost certainly caused by defective ram given the memtest result you showed. As for why it seemed like it was fixed, it is possible that the ram only misbehaves when it is for example above a certain temperature. When reseating your ram it may have had a chance to cool down just enough to work for a bit.
I can confirm another instance of such behaviour. It was also unpredictable, getting SEGFAULT and sometimes SIGILL on random dependencies during build. I tried switching to windows and it was working fine, all 5-10 clean rebuilds finished successfully. Then, i ran memtest a few times which did not show any issues with RAM.
Finally, i've switched back to linux, and it was no longer failing to build project. It was about 2 weeks ago on a stable rust version.
Some observations: 1) non-clean rebuild was almost never failing. 2) clean rebuild was wailing with probability about 90% 3) some unexpected crashes were frequently happening on this hardware, mostly in games while GPU is loaded.
CPU: Intel i5-12400F
I'm closing this because I believe the problem reported here has been diagnosed as defective hardware, based on the memtest failure above.
Code
https://github.com/francesco-gaglione/money_manager/tree/b2121d47702466c5a6b1b6a57dd6fb29351d2b82
Meta
rustc --version --verbose
:Error output
Backtrace
```
```