rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.57k stars 12.74k forks source link

Stack overflow for big types in mpsc channels on separate thread #102246

Closed janie177 closed 1 week ago

janie177 commented 2 years ago

I tried this code:

use std::sync::mpsc::*;

struct BigStruct {
    _data: [u32; 32000],
}

fn main() {

let thread = std::thread::spawn(move || {
    let (_sender, receiver) = channel::<BigStruct>();

    for _data in receiver.try_iter() {

    }

});

    thread.join().unwrap();
    println!("Done :)");
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7ebdbceac5477ebb48a65f5ecacb5c62

I expected to see this happen: The program shouldn't do much beside printing a friendly message.

Instead, this happened: When running in Debug mode, the program crashes due to a stack overflow. In release mode it behaves as expected. Looping over the data in the receiver in combination with being on a separate thread triggers it. I tried running on nightly too, which produces the same results. I've now switched to crossbeam as a workaround, which does not suffer from the same issue fortunately.

rustc --version --verbose:

rustc 1.64.0 (a55dd71d5 2022-09-19)
binary: rustc
commit-hash: a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52
commit-date: 2022-09-19
host: x86_64-pc-windows-msvc
release: 1.64.0
LLVM version: 14.0.6
Backtrace

``` thread '' has overflowed its stack ```

lapointexavier commented 1 year ago

I've run into this when upgrading from 1.70.0 to 1.71.0. Compiling in release mode prevented the segfault when using 1.71.0, though the issue was not happening in 1.70.0.

ChrisDenton commented 1 year ago

I've not been able to reproduce this using the program in the OP. I've tried on both 1.71.1 and 1.73.0-nightly. In both cases it prints "Done :)".

rustc 1.73.0-nightly (439d066bc 2023-08-10)
binary: rustc
commit-hash: 439d066bcf9496b1b8c8dde8bef3bce607a621bb
commit-date: 2023-08-10
host: x86_64-pc-windows-msvc
release: 1.73.0-nightly
LLVM version: 17.0.0
lapointexavier commented 1 year ago

@ChrisDenton OP's didn't trigger it for me either, but after playing with it I was able to reproduce:

Repro

use std::sync::mpsc::*;

struct BigStruct {
    _data: [u8; 32768],
}

fn main() {
    let (sender, receiver) = channel::<BigStruct>();

    let thread1 = std::thread::spawn(move || {
        let data = BigStruct { _data: [0u8; 32768] };
        sender.send(data).unwrap();
    });

    thread1.join().unwrap();
    for _data in receiver.try_iter() {

    }
    println!("Done :)");
}

Output

~/Dev/hello_world$ cargo run --release
   Compiling hello_world v0.1.0 (/.../hello_world)
    Finished release [optimized] target(s) in 0.56s
     Running `target/release/hello_world`
Done :)

~/Dev/hello_world$ cargo run
   Compiling hello_world v0.1.0 (/.../hello_world)
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/hello_world`

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)
rustc --version --verbose
rustc 1.71.1 (eb26296b5 2023-08-03)
binary: rustc
commit-hash: eb26296b556cef10fb713a38f3d16b9886080f26
commit-date: 2023-08-03
host: x86_64-unknown-linux-gnu
release: 1.71.1
LLVM version: 16.0.5
lapointexavier commented 1 year ago

Also this may be system dependent, but I was playing with slice sizes and the trigger seems to be 8120 + 1 for the overflow. I don't know why release mode doesn't overflow, maybe something is being optimized.

ChrisDenton commented 1 year ago

Ok, I've not yet investigated but my guess would be something about how it's not being optimized in start_send is multiplying the needed stack space by a fair amount.

@ibraheemdev might have some ideas. Note though this only affects opt-level=0 builds.

Enselic commented 1 week ago

The reproduer can be simplifed to:

// We must keep receiver alive
#[allow(unused_variables)]

fn main() {
    let (sender, receiver) = std::sync::mpsc::channel::<[u8; 32768]>();

    let thread = std::thread::spawn(move || {
        sender.send([0u8; 32768]).unwrap();
    });

    thread.join().unwrap();

    println!("Done :)");
}

Regressed versions

The code runs fine in debug builds with 1.66 but not with 1.67. So this is a regression-from-stable-to-stable, but arguably not very severe since this seemingly has gone mostly unnoticed for a considerable amount of time.

With cargo bisect-rustc we find that this runs fine with nightly-2022-11-13 but not nightly-2022-11-14 which leaves us with

git log --oneline --no-merges \
    ^6284998a2677d7e3e8420db783f3aa4fd80d7423 \
    e631891f7ad40eac3ef58ec3c2b57ecd81e40615 

of which https://github.com/rust-lang/rust/pull/93563 seems most likely.

To demonstrate:

for toolchain in 1.66 \
                 1.67 \
                 nightly-2022-11-13 \
                 nightly-2022-11-14 ; do
    rm -rf target
    echo "\nUsing $toolchain"
    cargo +$toolchain run --quiet
done
Using 1.66
Done :)

Using 1.67
thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow

Using nightly-2022-11-13
Done :)

Using nightly-2022-11-14
thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow

Disassembly

The disassembly of std::sync::mpmc::list::Channel<[u8; 32768]>::start_send<[u8; 32768]> where we SIGSEGV in gdb looks like this:

||   0x55555555df70 <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE>       mov    %rsp,%r11        
││   0x55555555df73 <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE+3>     sub    $0x1f0000,%r11   
││   0x55555555df7a <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE+10>    sub    $0x1000,%rsp     
││  >0x55555555df81 <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE+17>    movq   $0x0,(%rsp)      

So we SIGSEGV when 0 is being written to the top of the stack (probably due to stack probing) after growing it ~0x1f0000 bytes which is ~2MB.

In the LLVM IR we can find:

  %_34 = alloca [1016064 x i8], align 8
  %_31 = alloca [1016064 x i8], align 8

I looked at the MIR too but couldn't find anything suspicious. Maybe someone used to debuggging monomorphization/codegen can easily find out why the stack grows by 2MB.

My system

$ cargo +1.67 rustc -- -vV
rustc 1.67.1 (d5a82bbd2 2023-02-07)
binary: rustc
commit-hash: d5a82bbd26e1ad8b7401f6a718a9c57c96905483
commit-date: 2023-02-07
host: x86_64-unknown-linux-gnu
release: 1.67.1
LLVM version: 15.0.6
$ uname -a
Linux martins-dator 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Amanieu commented 1 week ago

The cause seems to be due to the definition of Block which contains 31 copies of T. start_send calls Box::new(Block::<T>::new()) which is where the 1MB allocation comes from. This could be solved if Block was initialized in-place rather than using Box::new.

cuviper commented 1 week ago

This could be solved if Block was initialized in-place rather than using Box::new.

See #132738

When running in Debug mode, the program crashes due to a stack overflow. In release mode it behaves as expected.

I'll bet in release mode the optimizer was figuring out that direct initialization, since it knows about heap allocations.

There will remain some unstated limitation on item size due to copies in the call stack, especially in debug mode when it's not inlined, but at least we won't have that multiplier from the Block slots.