tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.12k stars 672 forks source link

bug(tracing-subscriber): Stack overflow dropping deeply nested spans #1147

Open kjvalencik opened 3 years ago

kjvalencik commented 3 years ago

Bug Report

tracing-subscriber with the registry feature enabled causes a stack overflow when recursively dropping deeply nested spans.

Version

├── tracing v0.1.22
│   ├── tracing-attributes v0.1.11 (proc-macro)
│   └── tracing-core v0.1.17
└── tracing-subscriber v0.2.15
    ├── tracing v0.1.22 (*)
    ├── tracing-core v0.1.17 (*)
    ├── tracing-log v0.1.1
    │   └── tracing-core v0.1.17 (*)
    └── tracing-serde v0.1.2
        └── tracing-core v0.1.17 (*)

Platform

macOS 11.1

Crates

tracing-subscriber with registry feature

Description

[package]
name = "tracing-test"
version = "0.1.0"
edition = "2018"

[dependencies]
env_logger= "*"
log = "*"
tracing = "*"
tracing-subscriber = "*"
use tracing::Span;

#[tracing::instrument()]
fn instrumented() {
    log::info!("Operation");
}

fn wrapper(span: Span) {
    let _ = span.enter();
    instrumented();
}

fn main() {
    tracing_subscriber::fmt::init();

    let depth = std::env::args()
        .skip(1)
        .next()
        .map(|arg| arg.parse::<usize>())
        .unwrap_or(Ok(10))
        .unwrap();

    eprintln!("Depth: {}", depth);

    let mut span = tracing::info_span!("root");

    for _ in 0..depth {
        span = tracing::info_span!(parent: &span, "wrapper");
        wrapper(span.clone());
    }

    eprintln!("Dropping spans");
}
RUST_LOG=info cargo run -- 1000 > /dev/null
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/tracing-test 1000`

Depth: 1000
Dropping spans
RUST_LOG=info cargo run -- 10000 > /dev/null
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/tracing-test 10000`
Depth: 10000
Dropping spans

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Abort trap: 6

Release requires more depth, presumably because of optimized stack usage, but still overflows.

RUST_LOG=info cargo run --release -- 100000 > /dev/null
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/tracing-test 100000`
Depth: 100000
Dropping spans

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Abort trap: 6

I had this issue in recursive async code. Because of async and heap allocation, stack overflow was not an issue for the untraced code, but was with tracing enabled.

davidbarsky commented 3 years ago

Thanks for reporting this. Can you include a backtrace for this? I have a suspicion as to what's happening here, but it'd be good to get confirmation via a backtrace.

kjvalencik commented 3 years ago

top-backtrace.txt

It's a very large backtrace. I included the top 1000 lines. Let me know if the entire backtrace would be useful.

The EXC_BAD_ACCESS is because it's overflowing the stack guard. It will fail at a fairly arbitrary section of the code depending on the size of the spans and other stack usage. I can also try reproducing with a smaller stack size.

Edit: Here's a full backtrace with --release and my stack size set to 128.

backtrace.txt

matklad commented 2 years ago

Still very much a problem I believe! I've submitted a reproducible test here: https://github.com/tokio-rs/tracing/pull/2199.

I can potentially work on a fix, but, a) no promises :), and, b) I would appreciate a pointer to where exactly I should add a "drop a tree of parents in a loop" code.

matklad commented 2 years ago

Although, taking a step back here:

The reason why we were hitting stack overflows is because we had buggy code which created infinitedely deep span trees. The bug was similar to https://onesignal.com/blog/solving-memory-leaks-in-rust/.

So perhaps a fix for that problem would be for tracing too emit some kind of error event if it detects an unreasonably deep span tree?