tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
26.37k stars 2.43k forks source link

Stack overflow in release builds with large futures #6057

Open mathieulj opened 11 months ago

mathieulj commented 11 months ago

Version

❯ cargo tree | rg tokio
repro-tokio v0.1.0 (/Users/mathieuletendre-jauniaux/work/repro-tokio)
└── tokio v1.32.0
    └── tokio-macros v2.1.0 (proc-macro)

Platform

I've observed this on both macos and windows but the problem likely also affects linux.

❯ uname -a
Darwin Mathieus-MacBook-Pro-2.local 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64

Description

In debug builds, spawn will put futures on the heap but the release behaviour is to just pass it on via the stack. This leads to code that passes tests and runs fine in debug but crashes in production if the futures are sufficiently large.

Reproduction steps:

cargo init --bin repro-tokio
cd repro-tokio
cargo add tokio --features rt-multi-thread,macros,time
# edit main.rs

# doesn't crash
cargo run 

#crashes
cargo run --release
use std::time::Duration;
use tokio::{task, time};

#[tokio::main]
async fn main() {
    println!("This works both in debug and in release");
    let _ = task::spawn(Box::pin(future())).await;

    println!("This compiles but will crash at runtime **only in release builds**");
    let _ = task::spawn(future()).await;
}

async fn future() {
    let data = [0; 200_000];
    time::sleep(Duration::from_millis(100)).await;
    println!("{}", data[22]);
}

I expected to see this happen: Ideally tokio wouldn't overflow the stack ever but it should at minimum have the same behaviour in debug builds as it does in release builds.

Instead, this happened: tokio overflowed the stack only in release builds, test and debug builds work as expected.

Darksonn commented 11 months ago

Having the same code for debug and release does not guarantee identical behavior. The code in question was added because of frequent stack overflows that only happened in debug mode.

Ideally, I would like to avoid the box on release mode.

That said, there are ways to avoid the stack overflow without introducing extra indirection. Those are probably worth exploring.

Nerdy5k commented 10 months ago

Close please

mhils commented 4 days ago

I've opened #6826 in an attempt to fix this footgun.

This problem can be relatively tedious to track down for users. To illustrate this with our example:

  1. We find our Windows CI failing with "Windows fatal exception: stack overflow" on a dependency update. No more details, Linux and macOS work fine.
  2. First attempts to reproduce on Windows fail because I used debug builds. Eventually realize I need release builds.
  3. Bisect our code to a Dependabot commit (https://github.com/mitmproxy/mitmproxy_rs/commit/ea7123d751ea8f285ca07724c30074356a4dfac9).
  4. Bump dependencies individually. Figure out it's the tokio upgrade.
  5. Bisect tokio to https://github.com/tokio-rs/tokio/commit/ab53bf0c4727ae63c6a3a3d772b7bd837d2f49c3 - apparently a slight increase in future size has pushed us over the edge - "niche-optimization" 😅.
  6. Searching the tokio issue tracker I find #6057. Verify that always pinning fixes it.
  7. Patch tokio to panic for std::mem::size_of::<T>() > BOX_FUTURE_THRESHOLD to find all offending futures in our code base. Find & pin them individually. (Not sure if there's a better way, but this worked?)

This was a fun riddle, so no complaints at all. :) But I hope this shows that cause and effect can be relatively far apart.