Closed nappa85 closed 3 years ago
I've been able to create a core dump, the backtrace is:
@nappa85: How many runs of runtest.sh
did it take to reproduce the crash? The project is compiling very slowly for me, and taking large amounts of memory (~4-5 GB)
@nappa85: How many runs of
runtest.sh
did it take to reproduce the crash? The project is compiling very slowly for me, and taking large amounts of memory (~4-5 GB)
As I already told, it's completely random. The core dump I attached happened on first try, for example. I'm actually using another machine to exclude the hardware problem, at the beginning of the sixth iteration of runtest.sh I had this failure:
process didn't exit successfully: `rustc --crate-name fix50sp2 --edition=2018 fix50sp2/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata -C embed-bitcode=no -C debuginfo=2 -C metadata=a28a0a3657aedba2 -C extra-filename=-a28a0a3657aedba2 --out-dir /home/nappa85/temp/target/debug/deps -C incremental=/home/nappa85/temp/target/debug/incremental -L dependency=/home/nappa85/temp/target/debug/deps --extern fix_common=/home/nappa85/temp/target/debug/deps/libfix_common-b432d750f411466d.rmeta --extern fixt11=/home/nappa85/temp/target/debug/deps/libfixt11-045522708ff761af.rmeta --extern serde=/home/nappa85/temp/target/debug/deps/libserde-3d4d15af8ef49c31.rmeta --extern serde_fix=/home/nappa85/temp/target/debug/deps/libserde_fix-798dff9103619a78.rmeta` (signal: 9, SIGKILL: kill)
Seems like a kill for timeout or for memory consumption, but I don't understand why on the 5 previous iterations it was ok...
If it was killed by the linux OOM killer, there should be a message in the kernel log which you can see by running dmesg
. This would also explain the randomness, b/c whether you hit it would depend on how much memory all other applications on the system are using.
Just to make sure, the system this was originally reproduced with is running at stock, no memory/CPU overclock? Asking because it is extremely popular to run Ryzen with overclocked memory via XMP or a similar technology and sometimes such overclocks do result in issues like these being reported against this repository.
Just to make sure, the system this was originally reproduced with is running at stock, no memory/CPU overclock? Asking because it is extremely popular to run Ryzen with overclocked memory via XMP or a similar technology and sometimes such overclocks do result in issues like these being reported against this repository.
No overclock at all, it's a brand new notebook
An update on the topic: I've run my script for more than 12 hours on another machine without a problem, only now it has been killed, but I think it's a load problem. It must be said that this machine is slower than mine, a full project builds here takes 30 minutes, on my machine it takes only 6 minutes.
I'm stress-testing my machine in every way to exclude an hardware problem, but at the moment without luck.
Memtest86 says the memory is ok, I've tried stress (http://manpages.ubuntu.com/manpages/hirsute/man1/stress.1.html) with stress --cpu 8 --io 8 --vm 20 --vm-bytes 1024M --vm-keep
without a single problem...
If you know how to exclude hardware problems, just tell me
I've run RUSTFLAGS='-Z time-passes' cargo +nightly check
on the workspace, this is the result:
time_passes.txt
As you can see, there are several point where more than 3GB of RAM is taken, a point takes almost 5GB... I don't know if it can be considered normal on a crate made only of serde structs...
Trying to use fix_message as a dependency of another crate (that is the purpose of the project):
memory allocation of 3357031879003488 bytes failed
error: could not compile `fix50sp2`
Caused by:
process didn't exit successfully: `rustc --crate-name fix50sp2 --edition=2018 /home/marco/.cargo/git/checkouts/serde_fix-8dbbdecc2774ced0/77dd8ea/fix50sp2/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 -C metadata=ed09c736436959db -C extra-filename=-ed09c736436959db --out-dir /home/marco/Progetti/therockfixer/target/debug/deps -L dependency=/home/marco/Progetti/therockfixer/target/debug/deps --extern fix_common=/home/marco/Progetti/therockfixer/target/debug/deps/libfix_common-8c1604d5a22c4f95.rmeta --extern fixt11=/home/marco/Progetti/therockfixer/target/debug/deps/libfixt11-5f2f118e94a7ecba.rmeta --extern serde=/home/marco/Progetti/therockfixer/target/debug/deps/libserde-1bc36b13d54a445b.rmeta --extern serde_fix=/home/marco/Progetti/therockfixer/target/debug/deps/libserde_fix-5ac183259d5eaa3a.rmeta --cap-lints allow` (signal: 6, SIGABRT: process abort signal)```
This looks like an OOM condition to me. The build takes almost 12 GiB of memory here. FWIW, attached is the output of RUSTFLAGS='-Z time-passes' cargo +nightly build 2>&1 | tee time
on my machine.
I don't think it's an OOM, the error is randomic like an UB. I've just replicated the error on windows, on the same hardware, at the fifth build (the first four correct). On windows it's a bit slower and it doesn't even reach 100% RAM usage
The problem has been replicated also by rust-analyzer mantainer, that has a better machine than mine
Can you see if this is a regression and if it is, narrow down the regression range?
The problem has been replicated also by rust-analyzer mantainer, that has a better machine than mine
The rust-analyzer panic is unrelated (and just a normal panic).
Can you see if this is a regression and if it is, narrow down the regression range?
Is it possible to bisect on something that happens randomly?
The problem has been replicated also by rust-analyzer mantainer, that has a better machine than mine
The rust-analyzer panic is unrelated (and just a normal panic).
Tomorrow I'll have access to another ryzen7 machine, to see if I replicate the problem here too
Is it possible to bisect on something that happens randomly?
Since the issue presents itself more reliably over multiple runs of the compiler, you can make a script that runs the a given task N times before declaring a commit to be "good".
Sorry for the long silence, it took me more than 10 days to prove Lenovo support that there were hardware problems even if tests passes. Now with a brand new hardware everything is stable. Closing the issue
I bumped into the same issue today.
My Machine's CPU is: AMD Ryzen 7 5800H with Radeon Graphics
My Machine's host OS is: Ubuntu 23.04
My Machine's guest OS is: alpine 3.17
I build cargo-c via docker
, podman
, virtualbox
, kvm+qemu
respectively, all failed due to this. finally, I build it using GitHub Actions: https://github.com/leleliu008/test/actions/runs/4769013342/jobs/8478957739 , it failed too.
same here:
error: could not compile syn
Caused by:
process didn't exit successfully: rustc --crate-name build_script_build --edition=2018 /root/.cargo/registry/src/github.com-1ecc6299db9ec823/syn-1.0.109/build.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --diagnostic-width=237 --crate-type bin --emit=dep-info,link -C embed-bitcode=no --cfg 'feature="clone-impls"' --cfg 'feature="default"' --cfg 'feature="derive"' --cfg 'feature="extra-traits"' --cfg 'feature="full"' --cfg 'feature="parsing"' --cfg 'feature="printing"' --cfg 'feature="proc-macro"' --cfg 'feature="quote"' --cfg 'feature="visit"' --cfg 'feature="visit-mut"' -C metadata=b768e7845a651bf8 -C extra-filename=-b768e7845a651bf8 --out-dir /root/checkmk/packages/cmk-agent-ctl/target/debug/build/syn-b768e7845a651bf8 -L dependency=/root/checkmk/packages/cmk-agent-ctl/target/debug/deps --cap-lints allow
(signal: 11, SIGSEGV: invalid memory reference)
warning: build failed, waiting for other jobs to finish...
root@dc03:~/checkmk/packages/cmk-agent-ctl# cargo build
Compiling proc-macro2 v1.0.51
Compiling quote v1.0.23
Compiling syn v1.0.109
Compiling libc v0.2.139
Compiling cfg-if v1.0.0
Compiling version_check v0.9.4
error: could not compile libc
Caused by:
process didn't exit successfully: rustc --crate-name build_script_build /root/.cargo/registry/src/github.com-1ecc6299db9ec823/libc-0.2.139/build.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --diagnostic-width=237 --crate-type bin --emit=dep-info,link -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="extra_traits"' --cfg 'feature="std"' -C metadata=0a647af3405cec8e -C extra-filename=-0a647af3405cec8e --out-dir /root/checkmk/packages/cmk-agent-ctl/target/debug/build/libc-0a647af3405cec8e -L dependency=/root/checkmk/packages/cmk-agent-ctl/target/debug/deps --cap-lints allow
(signal: 11, SIGSEGV: invalid memory reference)
system: raspberry pi 3b Linux dc03 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux
@rustbot label C-defective-hardware Adding the label because replacing the machine did actually fix the issue. The reports 2 years later are likely unrelated.
I know this bug doesn't forllow the template, I'll try to be as precise as possible, but I can't reproduce systematically the problem... I've created a really big project, almost made completely of serde proc-macros, you can find it here: https://github.com/nappa85/serde_fix/ As soon as the project started getting bigger, the builds started failing, I've split it in a workspace, but I haven't solved completely the problem.
The fails are random, and seems to happen more often on a "dirty" environment, where with "dirty" I mean without calling
cargo clean
before.I noticed the failure rate is kind of inverse proportional to the number (or the kind) of errors in the code. For example, before implementing Default for every enum in the code, and therefore having an error for missing Default for every non Option-wrapped enum, it was really hard to get a non failing build.
Now that the code is cleaner, to trigger the failure I created a script that adds a comment to a shared crate and then restarts
¢argo check
for all toolchains: https://github.com/nappa85/serde_fix/blob/master/runtest.sh I run it likeI=0; while ./runtest.sh; do :; done
Normally rustc fails with something like:
Sometimes, after a failure, I'm not able to restart a build until I run
cargo clean
, the build fails with:Only once it failed with a backtrace, that is:
Backtrace
``` thread 'rustc' panicked at 'index out of bounds: the len is 30 but the index is 1127271296', compiler/rustc_metadata/src/creader.rs:134:21 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace error: internal compiler error: unexpected panic note: the compiler unexpectedly panicked. this is a bug. note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md note: rustc 1.52.0-beta.3 (215738137 2021-04-06) running on x86_64-unknown-linux-gnu note: compiler flags: -C embed-bitcode=no -C debuginfo=2 -C incremental --crate-type lib note: some of the compiler flags provided by cargo are hidden query stack during panic: thread 'rustc' panicked at 'index out of bounds: the len is 30 but the index is 1127271296', compiler/rustc_metadata/src/creader.rs:134:21 stack backtrace: 0: 0x7fc8e99fbb00 - std::backtrace_rs::backtrace::libunwind::trace::hc65bb72b4a549d12 at /rustc/215738137bcbef2c3637a5bd290ef612cffe6ba5/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5 1: 0x7fc8e99fbb00 - std::backtrace_rs::backtrace::trace_unsynchronized::h6e6089972b3c123e at /rustc/215738137bcbef2c3637a5bd290ef612cffe6ba5/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 2: 0x7fc8e99fbb00 - std::sys_common::backtrace::_print_fmt::h6a259ed64281b14e at /rustc/215738137bcbef2c3637a5bd290ef612cffe6ba5/library/std/src/sys_common/backtrace.rs:67:5 3: 0x7fc8e99fbb00 -My machine is an AMD Ryzen 7 4700U with 16GB of RAM running Kubuntu 21.04 (same problems with 20.04 and 20.10). If you need any additional info, just ask me, my goal is to help improve the compiler.