rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.19k stars 12.81k forks source link

SIGSEGV from rustc while building crate `legion` #77869

Open alex5nader opened 4 years ago

alex5nader commented 4 years ago

Code

I am not sure what part of legion is causing this. I have not encountered this issue for any other crates.

Meta

rustc --version --verbose:

rustc 1.47.0 (18bf6b4f0 2020-10-07)
binary: rustc
commit-hash: 18bf6b4f01a6feaf7259ba7cdae58031af1b7b39
commit-date: 2020-10-07
host: x86_64-unknown-linux-gnu
release: 1.47.0
LLVM version: 11.0

Error output

   Compiling legion v0.3.1 (/data/Projects/legion)
error: could not compile `legion`.

Caused by:
  process didn't exit successfully: `rustc --crate-name legion --edition=2018 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 --cfg 'feature="codegen"' --cfg 'feature="crossbeam-channel"' --cfg 'feature="crossbeam-events"' --cfg 'feature="default"' --cfg 'feature="erased-serde"' --cfg 'feature="legion_codegen"' --cfg 'feature="parallel"' --cfg 'feature="rayon"' --cfg 'feature="serde"' --cfg 'feature="serialize"' -C metadata=14f1150a42ae3e4b -C extra-filename=-14f1150a42ae3e4b --out-dir /data/Projects/legion/target/debug/deps -C incremental=/data/Projects/legion/target/debug/incremental -L dependency=/data/Projects/legion/target/debug/deps --extern bit_set=/data/Projects/legion/target/debug/deps/libbit_set-0f027bbe9088639b.rmeta --extern crossbeam_channel=/data/Projects/legion/target/debug/deps/libcrossbeam_channel-e02935c1a92635b3.rmeta --extern derivative=/data/Projects/legion/target/debug/deps/libderivative-027c3cec12a884ca.so --extern downcast_rs=/data/Projects/legion/target/debug/deps/libdowncast_rs-818a53b23fc7be82.rmeta --extern erased_serde=/data/Projects/legion/target/debug/deps/liberased_serde-c8566e1a0c06d2b3.rmeta --extern itertools=/data/Projects/legion/target/debug/deps/libitertools-4b46418de185c381.rmeta --extern legion_codegen=/data/Projects/legion/target/debug/deps/liblegion_codegen-7fefbee3b51a1a22.so --extern parking_lot=/data/Projects/legion/target/debug/deps/libparking_lot-1282ab6a8685ce14.rmeta --extern paste=/data/Projects/legion/target/debug/deps/libpaste-69df8912f33518e2.so --extern rayon=/data/Projects/legion/target/debug/deps/librayon-1e861157ad884d7a.rmeta --extern serde=/data/Projects/legion/target/debug/deps/libserde-2a6ef3a1ac05b029.rmeta --extern smallvec=/data/Projects/legion/target/debug/deps/libsmallvec-7e54452c7c62a719.rmeta --extern thiserror=/data/Projects/legion/target/debug/deps/libthiserror-e30ede5540027b3b.rmeta --extern uuid=/data/Projects/legion/target/debug/deps/libuuid-1b1ed382cc39f9ea.rmeta` (signal: 11, SIGSEGV: invalid memory reference)
Backtrace

``` #0 free (ptr=0x48c2df416aec43d6) at ../jemalloc/src/jemalloc.c:2393 #1 0x00007ffff3511bcc in as core::ops::drop::Drop>::drop () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #2 0x00007ffff35bf011 in ::visit_local () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #3 0x00007ffff35be822 in ::visit_block () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #4 0x00007ffff35c07d0 in ::visit_fn () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #5 0x00007ffff3549f9c in rustc_ast::visit::walk_assoc_item () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #6 0x00007ffff35cc54e in rustc_resolve::late::LateResolutionVisitor::with_generic_param_rib () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #7 0x00007ffff35c2f39 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #8 0x00007ffff355696e in rustc_ast::visit::walk_item () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #9 0x00007ffff35c2473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #10 0x00007ffff355696e in rustc_ast::visit::walk_item () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #11 0x00007ffff35c2473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #12 0x00007ffff3547d42 in rustc_ast::visit::walk_crate () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #13 0x00007ffff3588ac7 in rustc_resolve::Resolver::resolve_crate () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #14 0x00007ffff08f3c97 in rustc_interface::passes::configure_and_expand_inner () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #15 0x00007ffff08d06c9 in rustc_interface::passes::configure_and_expand::{{closure}} () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #16 0x00007ffff08aaecf in rustc_data_structures::box_region::PinnedGenerator::new () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #17 0x00007ffff08f2965 in rustc_interface::passes::configure_and_expand () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #18 0x00007ffff0913f73 in rustc_interface::queries::Queries::expansion () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #19 0x00007ffff05bd887 in rustc_interface::queries::::enter () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #20 0x00007ffff0551f27 in rustc_span::with_source_map () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #21 0x00007ffff05bf513 in rustc_interface::interface::create_compiler_and_run () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #22 0x00007ffff059d9fa in scoped_tls::ScopedKey::set () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #23 0x00007ffff05b2957 in std::sys_common::backtrace::__rust_begin_short_backtrace () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #24 0x00007ffff053bdae in core::ops::function::FnOnce::call_once{{vtable-shim}} () from /home/noobstar/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so #25 0x00007fffef949f5a in as core::ops::function::FnOnce>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042 #26 as core::ops::function::FnOnce>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042 #27 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:87 #28 0x00007fffef84f606 in ?? () from /usr/lib/libpthread.so.0 #29 0x00007fffef775753 in clone () from /usr/lib/haswell/libc.so.6 ```

alex5nader commented 4 years ago

Building legion on Rust 1.46.0 does work.

rustc --version --verbose:

rustc 1.46.0 (04488afe3 2020-08-24)
binary: rustc
commit-hash: 04488afe34512aa4c33566eb16d8c912a3ae04f9
commit-date: 2020-08-24
host: x86_64-unknown-linux-gnu
release: 1.46.0
LLVM version: 10.0
jyn514 commented 4 years ago

Possible duplicate of #77849

ehuss commented 4 years ago

I'm able to reproduce this, although it is finicky. I'm able to reproduce on stable, and as far back as 1.43. I've been having a hard time bisecting to a specific change, since it is a little inconsistent (it can take a few hundred incremental builds before it fails). The failures seem to start around 126ad2b813010447807b0593a80bc6c04962e7ea (#68708), although it might be earlier.

I can only repro on my main linux system, but I can't seem to repro on a VM.

It seems to always fail with a call to free on an invalid pointer inside LateResolutionVisitor. It doesn't matter if it is built with jemalloc or not.

I might keep poking at it for a bit, but I think I'm unlikely to make any breakthroughs.

apiraino commented 4 years ago

just out of curiosity, are there conditions that could accellerate the "reproducibility"? Like, if it's a memory exhaustion and allocations fail, could that theoretically happen sooner on a system (hand-wavy speaking) with resources artificially kept busy?

Aaron1011 commented 4 years ago

@ehuss: What commit of legion did you build?

ehuss commented 4 years ago

@apiraino I don't think it has anything to do with resource exhaustion. So far I have 0 clues. I tried running on valgrind overnight, but it wouldn't fail.

@Aaron1011 I'm on 0733aa39b253b3404544afc3485d332429009799 (v0.3.1).

@alex5nader Can you include which model of CPU you are using?

alex5nader commented 4 years ago

@ehuss I'm using a Ryzen 5 1600.

OvermindDL1 commented 4 years ago

I've been getting exactly this same bug for many rust versions both stable and nightly (currently on 1.47) over the past ~6 months or so that I've been trying legion from legion 2.4 to 3.0 to its git version, using a Ryzen7. Even a freshly created cargo new ... project with just legion added as a dependency and nothing else changed causes this every single time. Been compiling a multitude of many other projects with excessive dependencies without issues, it's only just legion.

Here's a GDB backtrace of the SIGSEGV (which happens on thread 2):

#0  free (ptr=0x48c2df416aec23d6) at ../jemalloc/src/jemalloc.c:2393
#1  0x00007ffff3513bcc in <smallvec::SmallVec<A> as core::ops::drop::Drop>::drop () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#2  0x00007ffff35c1011 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_local () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#3  0x00007ffff35c0822 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_block () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#4  0x00007ffff35c27d0 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_fn () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#5  0x00007ffff354bf9c in rustc_ast::visit::walk_assoc_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#6  0x00007ffff35ce54e in rustc_resolve::late::LateResolutionVisitor::with_generic_param_rib () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#7  0x00007ffff35c4f39 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#8  0x00007ffff355896e in rustc_ast::visit::walk_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#9  0x00007ffff35c4473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#10 0x00007ffff355896e in rustc_ast::visit::walk_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#11 0x00007ffff35c4473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#12 0x00007ffff3549d42 in rustc_ast::visit::walk_crate () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#13 0x00007ffff358aac7 in rustc_resolve::Resolver::resolve_crate () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#14 0x00007ffff08f5c97 in rustc_interface::passes::configure_and_expand_inner () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#15 0x00007ffff08d26c9 in rustc_interface::passes::configure_and_expand::{{closure}} () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#16 0x00007ffff08acecf in rustc_data_structures::box_region::PinnedGenerator<I,A,R>::new () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#17 0x00007ffff08f4965 in rustc_interface::passes::configure_and_expand () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#18 0x00007ffff0915f73 in rustc_interface::queries::Queries::expansion () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#19 0x00007ffff05bf887 in rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#20 0x00007ffff0553f27 in rustc_span::with_source_map () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#21 0x00007ffff05c1513 in rustc_interface::interface::create_compiler_and_run () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#22 0x00007ffff059f9fa in scoped_tls::ScopedKey<T>::set () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#23 0x00007ffff05b4957 in std::sys_common::backtrace::__rust_begin_short_backtrace () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#24 0x00007ffff053ddae in core::ops::function::FnOnce::call_once{{vtable-shim}} () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#25 0x00007fffef94bf5a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#26 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#27 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:87
#28 0x00007fffef83e669 in start_thread (arg=<optimized out>) at pthread_create.c:479
#29 0x00007fffef7642b3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I can very reliably reproduce this. No other known issues with the system, everything else compiles without issue, everything runs without issue, memtest and other stress tests run without issue.

OvermindDL1 commented 4 years ago

I've uploaded my no-op project that constantly reproduces on my system to: https://github.com/OvermindDL1/legion_testing

Just cargo build and when it gets to legion after all its other dependencies, then it crashes. I'm guessing you might need a Ryzen CPU (on perhaps linux, using Ubuntu 20.04 here) based on all other reports I've been seeing of this so far?

OvermindDL1 commented 4 years ago

Memory allocation appears to be fairly minimal at the point of crash, 365megs of VIRT and 348megs of RES, with 99004 of SHM, does not appear to be resource exhaustion of anything that I can see.

OvermindDL1 commented 4 years ago

I cloned https://github.com/TomGillen/legion.git and building it via cargo build also produces the same error. So you can just clone the source project itself and build it to test.

OvermindDL1 commented 4 years ago

After testing of a few things, I found if I removed the legion_codegen library from inside the Cargo.toml it then compiles.

In the small test project, leaving out the default features (which should leave out the legion_codegen crate) does not allow it to compile.

Note, it's legion failing to compile, legion_codegen compiles fine, I'm trying to see what legion_codegen does now...

OvermindDL1 commented 4 years ago

So legion itself doesn't so anything with legion_codegen other than just re-export it, that's it. Seems it's the procmacro to generate the system attribute. Why would just re-exporting it cause compiling legion to crash though...

EDIT1: Commenting out the entirety of legion_codegen's source code still causes a compilation failure.

EDIT2: Commenting out all of its dependencies still causes a compilation failure...

EDIT3: Also commenting out proc-macro = true still fails to compile.

EDIT4: Commenting out legion_codegen from legion's Cargo.toml is failing to compile, even after a cargo clean, when it compiled properly before... There seems to be some indeterminacy here...

EDIT5: Removed all optional dependencies and its still failing to compile, even after a clean.

EDIT6: Slowly commenting out large swaths of legion and replacing them with no-ops and got it down to something in the internals module so far...

EDIT7: So far I've got it down to src/internals/cons.rs!

EDIT8: And got it down to this macro call impl_flatten!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z);, I'm now peeling it apart...

EDIT9: Got it down to this line in the macro: let cons!($($items),*) = self; Peeling apart the cons macro now...

EDIT10: Okay so the macro's seem fine, however the argument count to impl_flatten is causing it, if it is reduced to impl_flatten!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R); then it works, but increasing it by 1 to impl_flatten!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S); and it SIGSEGV's...

EDIT11: Interestingly, if I try to remove some of the entirely empty modules that I completely commented out then it compiles again...

EDIT12: Got it down to just an empty src/internals/entity.rs and the src/internals/cons.rs (mostly commented out except that macro and the trait it implements) and it still SIGSEGV's, trying to reduce further...

OvermindDL1 commented 4 years ago

So far the only code let uncommented is in src/internals/cons.rs and it is:

macro_rules! cons {
    () => (
        ()
    );
    ($head:tt) => (
        ($head, ())
    );
    ($head:tt, $($tail:tt),*) => (
        ($head, cons!($($tail),*))
    );
}

fn blah() {
    let cons!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z) = todo!();
}

And in src/internals.mod.rs:

pub mod cons;

Apparently it's getting more random when it happens the more code I remove, it still happens about 50% of the time though. And in src/lib.rs:

mod internals;

Going to try pulling this into its own project now to see if I can replicate it more standalone...

OvermindDL1 commented 4 years ago

I have reduced the code significantly, error is now:

$ cargo build
   Compiling legion_testing v0.1.0 (/home/overminddl1/rust/legion_testing)
error: could not compile `legion_testing`.

Caused by:
  process didn't exit successfully: `rustc --crate-name legion_testing --edition=2018 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 -C metadata=668da26770ceeea9 -C extra-filename=-668da26770ceeea9 --out-dir /home/overminddl1/rust/legion_testing/target/debug/deps -C incremental=/home/overminddl1/rust/legion_testing/target/debug/incremental -L dependency=/home/overminddl1/rust/legion_testing/target/debug/deps` (signal: 11, SIGSEGV: invalid memory reference)

I have updated the https://github.com/OvermindDL1/legion_testing project to remove legion and just have the code that tests it. I'm trying to reduce it further but I may be hitting the limit. If I manage to reduce it further then I'll update that repo and post here.

OvermindDL1 commented 4 years ago

I've reduced it a little more, I've noticed that the more arguments I remove from the cons! call then the significantly lower chance it has of happening, leaving it with the full alphabet makes it about 75% of the time it will crash. Again this is on a Ryzen7 with Ubuntu 18.10 with these versions:

$ rustc --version --verbose
rustc 1.47.0 (18bf6b4f0 2020-10-07)
binary: rustc
commit-hash: 18bf6b4f01a6feaf7259ba7cdae58031af1b7b39
commit-date: 2020-10-07
host: x86_64-unknown-linux-gnu
release: 1.47.0
LLVM version: 11.0
$ cargo --version --verbose
cargo 1.47.0 (f3c7e066a 2020-08-28)
release: 1.47.0
commit-hash: f3c7e066ad66e05439cf8eab165a2de580b41aaf
commit-date: 2020-08-28

Is anyone else above that was having an issue compiling legion try out this minimal repo and cargo clean; cargo build a few times to confirm? Perhaps try to reduce it further?

The current reproducing code is (in src/lib.rs):

macro_rules! cons {
    ($head:tt) => (
        ($head, ())
    );
    ($head:tt, $($tail:tt),*) => (
        ($head, cons!($($tail),*))
    );
}

fn blah() {
    let cons!(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, z) = todo!();
}
OvermindDL1 commented 4 years ago

Oh, and as a note, it still happens if you replace tt in the macro with ident as well.

jyn514 commented 4 years ago

@OvermindDL1 if you expand out the macro does it still crash? Or does it require using the macro?

Aaron1011 commented 4 years ago

@OvermindDL1: I can't reproduce the crash at all with your repository.

Does this happen if you run rustc directly on the file? Could you record a trace with rr?

OvermindDL1 commented 4 years ago

The reason I stopped was because I ran out of time, I'm currently driving for a while, I'm unsure if I'll be able to get to it, if I get Time to look at it tonight I'll try to, otherwise I may be delayed by up to Monday, so if anyone else can get to it that is able to replicate it then that would probably be better.

OvermindDL1 commented 4 years ago

@Aaron1011 I'm curious of your CPU and OS

Aaron1011 commented 4 years ago

@OvermindDL1: I'm running Arch Linux with an Intel Core i9-8950HK

ehuss commented 4 years ago

@Aaron1011 I'm able to reproduce with the reduced macro rules example. It can take a fair number of runs for it to fail (for me, anywhere from 1 to 500 runs). I can't seem to get rr to work very well (I have an AMD cpu). If I try to reverse-next from the failure, it says Expected syscall_bp_vm to be clear but it's 2518439's address space with a breakpoint at 0x7f11b6c3e353 while we're at 0x70000008. I haven't used it before, so I'm not too familiar with it.

Just using gdb with the core dump, it's pretty much the same error as before. Inside resolve_pattern_top it is calling drop_in_place, as best I can see it is freeing a pointer into the middle of some object code (rustc_ast::ast::Pat::walk+348).

OvermindDL1 commented 4 years ago

@Aaron1011 So not a Ryzen, so far it seems everyone this happens to has a Ryzen, interesting...

For note, remotely from my phone over ssh I'm trying to do what I can, even rustc --edition=2018 src/lib.rs --crate-type=lib segfaults as well, so does rustc --edition=2018 src/lib.rs --crate-type=lib -Zunstable-options --pretty=expanded > src/lib_expanded.rs, but I got the macro expanded version after a few tries:

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std;
macro_rules! cons {
    ($ head : ident) => (($ head, ())) ;
    ($ head : ident, $ ($ tail : ident), *) =>
    (($ head, cons ! ($ ($ tail), *))) ;
}

fn blah() {
    let (a,
         (b,
          (c,
           (d,
            (e,
             (f,
              (g,
               (h,
                (i,
                 (j,
                  (k,
                   (l,
                    (m,
                     (n,
                      (o,
                       (p,
                        (q,
                         (r,
                          (s,
                           (t,
                            (u, (v, (w, (x, (z, ()))))))))))))))))))))))))) =
        { ::std::rt::begin_panic("not yet implemented") };
}

And compiling it via rustc --edition=2018 src/lib_expanded.rs --crate-type=lib also segfaults, so it's not a macro issue, still about a 50% crash rate (the other 50% is just reporting the file error as normal). Reduced further to:

fn blah() {
    let (a,
         (b,
          (c,
           (d,
            (e,
             (f,
              (g,
               (h,
                (i,
                 (j,
                  (k,
                   (l,
                    (m,
                     (n,
                      (o,
                       (p,
                        (q,
                         (r,
                          (s,
                           (t,
                            (u, (v, (w, (x, (z, ()))))))))))))))))))))))))) =
        { ::std::rt::begin_panic("not yet implemented") };
}

Again, reducing the depth of the tuples lowers the chance that it happens significantly, very rarely if removed 2, more common crash if adding more. Can replace { ::std::rt::begin_panic("not yet implemented") } with just () as well to become:

fn blah() {
    let (a,
         (b,
          (c,
           (d,
            (e,
             (f,
              (g,
               (h,
                (i,
                 (j,
                  (k,
                   (l,
                    (m,
                     (n,
                      (o,
                       (p,
                        (q,
                         (r,
                          (s,
                           (t,
                            (u, (v, (w, (x, (z, (aa, ())))))))))))))))))))))))))) = ();
}

And still happens about 50% of the time for me.

Hard to do much from my phone, but will try more later as I can.

OvermindDL1 commented 4 years ago

This crash feels very similar to https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20 but I'm very unsure.

Unrelated, but is there a way to get rustc with a newer jemalloc or built without jemalloc just as a test?

ehuss commented 4 years ago

@OvermindDL1 I've been testing without jemalloc, and get the same results, so I don't think it is an issue. If you build rustc from source (x.py build library/std), the default is without jemalloc.

OvermindDL1 commented 4 years ago

@ehuss Very cool, thanks for checking without jemalloc. What's your CPU and OS? You said AMD, but is it a ryzen? I have multiple machines here to test with, most of them AMD, only one is a ryzen and that one is the only one that has an issue, unfortunately it's also my fastest cpu by a significant margin so it's the system I usually use as a build host.

ehuss commented 4 years ago

I have a Ryzen Threadripper 2950X, on Ubuntu 20.04. I'm in the same boat, this is the only machine where it reproduces, but it is also by far the fastest one, so I'm still not sure if it is AMD-specific.

OvermindDL1 commented 4 years ago

It always happens on a different thread than the main thread, so I'm actually quite curious if it's some kind of race condition with many core CPUs. Is there a way to specify the number of threads that rustc is allowed to use? I would love to test with a single thread, two threads, on up until I can reproduce it.

I guess I can just load it with a forced cpu core affinity, I'll try to do that the next opportunity I get but it might not be for a little while, so if someone else is able to do before me that would probably be better.

ehuss commented 4 years ago

For the most part, rustc is single threaded, it just runs everything on a dedicated thread for various reasons. It only uses multiple threads for code generation (in llvm), and this crash is happening far earlier than that.

ishitatsuyuki commented 4 years ago

rustc always spawns a thread for the purpose of controlling stack size. If that's hindering your debugging, then you can patch rustc like in https://github.com/rust-lang/rust/pull/48575.

mati865 commented 4 years ago

@ehuss for rr on Zen you have to use one of workarounds from here: https://github.com/mozilla/rr/wiki/Zen

ehuss commented 4 years ago

Yea, I implemented the workaround, and the script printed Zen workaround in place. It seems to print that error whenever it steps over certain syscalls like IO. I'm also a bit confused, I ran rustc in a loop until it crashed. It very clearly dumped a core file, and that core file that has a stack that I expect, but when I run rr replay, and manually step through the problem area (resolve_pattern_top), it steps through as-if everything is OK. It's like it is replaying one of the previous successful runs. It's quite confusing.

apiraino commented 4 years ago

@ehuss @OvermindDL1 impressive work done here to try to reproduce. Can we now set some facts about it? I'm trying to square the issue for the compiler team.

Is the latest snippet in this comment a good reproducible example at least in some range of conditions? Second fact, can we rule out a CPU vendor specific issue? What else can we say about this to help reproducing it reliably?

thanks!

ehuss commented 4 years ago

The simplification listed above fails on some versions, but not all. It seems to be really sensitive and will pass where the original legion still fails. For example, nightly-2020-10-03 fails where nightly-2020-10-04 passes. However, legion still fails for me on nightly-2020-10-04.

I did fair bit of investigation, but did not find anything terribly useful. It is very sensitive to the exact code layout and optimization settings of rustc. For example, compiling rustc_resolve with -O2 causes the problem to go away. Adding #[inline(never)] to resolve_pattern_top also makes the problem go away.

I cannot rule out that it is AMD-specific because I don't have easy access to a fast Intel system. I was unable to repro in a virtual machine on an Intel machine. I was also unable to repro on macOS (Intel) or Windows (AMD).

If someone can reproduce on an Intel Linux system, that would help rule out anything CPU-specific. If they can get it to fail, then running rr could be really helpful, since I can't seem to get it to work correctly on my AMD system.

The script I use to run is:

#!/bin/bash

# Run with RUSTUP_TOOLCHAIN=<toolchain name> to test different toolchains.

ulimit -c unlimited

set -e

rustc -V

for i in {1..1000}
do
    echo $i
    rustc --crate-type rlib foo.rs --emit=metadata
    # Change to this if testing a cargo project:
    # touch src/lib.rs
    # cargo check
done
tput bel
jyn514 commented 4 years ago

Assigning P-medium as discussed as part of the Prioritization Working Group procedure and removing I-prioritize. Also assigning I-nominate so we can try to get eyes on the root cause of the issue.

spastorino commented 4 years ago

This was discussed during today's compiler weekly meeting