quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.49k stars 353 forks source link

Underflow computing growth in stream data flow control credit #1818

Open fl0rek opened 1 month ago

fl0rek commented 1 month ago

We've encountered a panic in quinn-proto-0.10.4. We're using quinn-0.10.2 with libp2p-0.53.1.

thread 'main' panicked at HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-proto-0.10.4/src/connection/streams/recv.rs:108:20:
attempt to subtract with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-0.10.2/src/mutex.rs:138:42:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
Full Backtrace ``` Finished dev [unoptimized + debuginfo] target(s) in 1.89s Running `target/debug/celestia node` thread 'main' panicked at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-proto-0.10.4/src/connection/streams/recv.rs:108:20: attempt to subtract with overflow note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace thread 'main' panicked at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-0.10.2/src/mutex.rs:138:42: called `Result::unwrap()` on an `Err` value: PoisonError { .. } stack backtrace: 0: 0x102468cb4 - std::backtrace_rs::backtrace::libunwind::trace::hdc7d2d82dacc83e6 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5 1: 0x102468cb4 - std::backtrace_rs::backtrace::trace_unsynchronized::h11fcbf27db856d40 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 2: 0x102468cb4 - std::sys_common::backtrace::_print_fmt::h117e5fb45bd92414 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/sys_common/backtrace.rs:68:5 3: 0x102468cb4 - ::fmt::hd4e6df6dd376d525 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/sys_common/backtrace.rs:44:22 4: 0x102487b68 - core::fmt::rt::Argument::fmt::h4f724c1ed2e34985 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/fmt/rt.rs:142:9 5: 0x102487b68 - core::fmt::write::h5a442bf928ef18fb at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/fmt/mod.rs:1120:17 6: 0x102465e18 - std::io::Write::write_fmt::h01fe6f9cc882fac7 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/io/mod.rs:1846:15 7: 0x102468ae8 - std::sys_common::backtrace::_print::h15d065176d59d5e1 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/sys_common/backtrace.rs:47:5 8: 0x102468ae8 - std::sys_common::backtrace::print::h249c7fd853efd630 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/sys_common/backtrace.rs:34:9 9: 0x10246a074 - std::panicking::default_hook::{{closure}}::h539bfee0454d0951 10: 0x102469dbc - std::panicking::default_hook::he5a7d244fe90fc13 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:292:9 11: 0x10246a4b0 - std::panicking::rust_panic_with_hook::h9b82bdc13ccb171b at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:781:13 12: 0x10246a3a4 - std::panicking::begin_panic_handler::{{closure}}::hf609a05c98725eb7 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:659:13 13: 0x102469138 - std::sys_common::backtrace::__rust_end_short_backtrace::hea0399d584786a2b at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/sys_common/backtrace.rs:171:18 14: 0x10246a12c - rust_begin_unwind at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:647:5 15: 0x1024e7478 - core::panicking::panic_fmt::hb4d79f470bcac176 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/panicking.rs:72:14 16: 0x1024e78b8 - core::result::unwrap_failed::h66d43c215ff45232 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/result.rs:1649:5 17: 0x1019e25f0 - core::result::Result::unwrap::h7c99f05345093b69 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/result.rs:1073:23 18: 0x1019e25f0 - quinn::mutex::non_tracking::Mutex::lock::h561694b9d8685ade at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-0.10.2/src/mutex.rs:138:24 19: 0x1019c5090 - ::drop::hc4a7e038cd0e4a1c at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quinn-0.10.2/src/connection.rs:792:26 20: 0x1019ebe3c - core::ptr::drop_in_place::h078d0937bddd5271 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ptr/mod.rs:507:1 21: 0x1019ebfcc - core::ptr::drop_in_place::h8c199c7e65a2ffdb at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ptr/mod.rs:507:1 22: 0x1019e9e20 - core::ptr::drop_in_place>::h981a232cae97d7a4 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ptr/mod.rs:507:1 23: 0x1019ea118 - core::ptr::drop_in_place>>::h4ed35e31bf719d81 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ptr/mod.rs:507:1 24: 0x1019ea61c - core::ptr::drop_in_place>>>::h7c87f6174096defa at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ptr/mod.rs:507:1 25: 0x1019f2630 - tokio::runtime::task::core::Core::set_stage::{{closure}}::hf1f1c753047e8705 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/core.rs:382:41 26: 0x1019f223c - tokio::loom::std::unsafe_cell::UnsafeCell::with_mut::had358646553d09b9 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/loom/std/unsafe_cell.rs:16:9 27: 0x1019f223c - tokio::runtime::task::core::Core::set_stage::hbc51f07db78f1672 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/core.rs:382:9 28: 0x1019f1588 - tokio::runtime::task::core::Core::drop_future_or_output::h68367009df92d723 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/core.rs:347:13 29: 0x1019c7310 - as core::ops::drop::Drop>::drop::hca90dad35f2db1bb at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:481:17 30: 0x1019eab80 - core::ptr::drop_in_place>,alloc::sync::Arc>>::h17eebe9864db8332 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ptr/mod.rs:507:1 31: 0x1019c8d54 - tokio::runtime::task::harness::poll_future::{{closure}}::hdad0a83362d2bb01 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:488:5 32: 0x1019e4308 - as core::ops::function::FnOnce<()>>::call_once::he8f0cedda7243058 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/panic/unwind_safe.rs:272:9 33: 0x1019cc714 - std::panicking::try::do_call::h4c002eecde49991f at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:554:40 34: 0x1019ce0a4 - ___rust_try 35: 0x1019cb8d4 - std::panicking::try::h476c6815dabdc207 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:518:19 36: 0x1019cb67c - std::panic::catch_unwind::h8ee26191249a4854 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panic.rs:142:14 37: 0x1019c82b0 - tokio::runtime::task::harness::poll_future::h3269162229e905a7 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:473:18 38: 0x1019c8f84 - tokio::runtime::task::harness::Harness::poll_inner::h49862e2a0312ee22 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:208:27 39: 0x1019c9dc8 - tokio::runtime::task::harness::Harness::poll::h6b2a4e76de7a7a30 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:153:15 40: 0x1019d5cfc - tokio::runtime::task::raw::poll::h767bbff8caccf35a at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/raw.rs:276:5 41: 0x1023c5ca0 - tokio::runtime::task::raw::RawTask::poll::ha0d737babbea93ea at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/raw.rs:200:18 42: 0x1020e3fd8 - tokio::runtime::task::LocalNotified::run::he64183064a74982b at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/mod.rs:408:9 43: 0x100d0006c - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::h4f54fd7046d30818 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:706:25 44: 0x100cff5c4 - tokio::runtime::coop::with_budget::hd1df283faa20597e at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:107:5 45: 0x100cff5c4 - tokio::runtime::coop::budget::hdbc87cf7abae73a3 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:73:5 46: 0x100cff5c4 - tokio::runtime::scheduler::current_thread::Context::run_task::{{closure}}::h4b04702e2166dc0f at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:343:43 47: 0x100cff2c8 - tokio::runtime::scheduler::current_thread::Context::enter::h5b01fcd5f4339216 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:410:19 48: 0x100cff468 - tokio::runtime::scheduler::current_thread::Context::run_task::h5b8e74259132849e at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:343:23 49: 0x100cfffe0 - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::ha0c20fcef49684df at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:705:34 50: 0x100cff970 - tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}::h9701170602fd9494 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:743:68 51: 0x100c418ac - tokio::runtime::context::scoped::Scoped::set::h66e055fe3d61cb8f at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context/scoped.rs:40:9 52: 0x10106dd48 - tokio::runtime::context::set_scheduler::{{closure}}::h363ffc220f660c3b at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context.rs:176:26 53: 0x100b73b78 - std::thread::local::LocalKey::try_with::hb1ef866639d0dc22 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/thread/local.rs:286:16 54: 0x100b6eab8 - std::thread::local::LocalKey::with::ha8b2f4275ead3c72 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/thread/local.rs:262:9 55: 0x10106dd08 - tokio::runtime::context::set_scheduler::h92feef536709b8dd at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context.rs:176:9 56: 0x100cff798 - tokio::runtime::scheduler::current_thread::CoreGuard::enter::h6d792852750080b8 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:743:27 57: 0x100cff9a0 - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::hc5b2a62620625faa at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:652:19 58: 0x100cf67d4 - tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}::h7b3eb93862647b6c at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:175:28 59: 0x100c0e464 - tokio::runtime::context::runtime::enter_runtime::hb5570fc4746770ed at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context/runtime.rs:65:16 60: 0x100cf66f4 - tokio::runtime::scheduler::current_thread::CurrentThread::block_on::h3910abd9a4b6dd10 at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:167:9 61: 0x100c98d94 - tokio::runtime::runtime::Runtime::block_on::h476c80be26a5a58d at /Users/mikolaj.florkiewicz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/runtime.rs:348:47 62: 0x100f3a2a4 - celestia::main::h5794a3ede700ae30 at /Users/mikolaj.florkiewicz/dev/lumina-header_sync_repro/celestia/src/main.rs:4:5 63: 0x1010e50b4 - core::ops::function::FnOnce::call_once::hc4b6dc09ac572737 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ops/function.rs:250:5 64: 0x1010720a4 - std::sys_common::backtrace::__rust_begin_short_backtrace::h93a345640907338a at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/sys_common/backtrace.rs:155:18 65: 0x100d7509c - std::rt::lang_start::{{closure}}::h6c6905c3eeb8cbbc at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/rt.rs:166:18 66: 0x102460254 - core::ops::function::impls:: for &F>::call_once::hc7fa490c25ff5c7e at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/ops/function.rs:284:13 67: 0x102460254 - std::panicking::try::do_call::h357310772e5ed201 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:554:40 68: 0x102460254 - std::panicking::try::h7be07bf7667211c0 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:518:19 69: 0x102460254 - std::panic::catch_unwind::h3ab45e592d6c9793 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panic.rs:142:14 70: 0x102460254 - std::rt::lang_start_internal::{{closure}}::h522f1a8ded7e5ad3 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/rt.rs:148:48 71: 0x102460254 - std::panicking::try::do_call::h403386826d59f38d at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:554:40 72: 0x102460254 - std::panicking::try::h5e255b7a06373cce at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:518:19 73: 0x102460254 - std::panic::catch_unwind::h462489ab339aac24 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panic.rs:142:14 74: 0x102460254 - std::rt::lang_start_internal::h7f00bef3cf1b3caa at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/rt.rs:148:20 75: 0x100d75068 - std::rt::lang_start::h7323fd224ebf8c76 at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/rt.rs:165:17 76: 0x100f3a340 - _main thread 'main' panicked at library/core/src/panicking.rs:163:5: panic in a destructor during cleanup thread caused non-unwinding panic. aborting. [1] 96704 abort cargo run node ```

I don't think I can provide much details on reproduction. Code was running on the network for ~5h while I was trying to reproduce another bug when I saw this panic.

let diff = max_stream_data - self.sent_max_stream_data; seems to be the offending substraction, but I don't know quinn well enought to tell whether that's a cause (to be fixed with e.g. saturating substraction) or a symptom of a deeper issue.

Ralith commented 1 month ago

Thanks for the report!

Full Backtrace

This is the backtrace for the poisoned mutex panic, not the original underflow panic that caused it. You'll want to look just above in the logs.

let diff = max_stream_data - self.sent_max_stream_data; seems to be the offending substraction

That's surprising. self.sent_max_stream_data is only ever populated with a max_stream_data returned from an immediately preceding call to fn max_stream_data. That value is a sum of self.assembler.bytes_read(), which grows monotonically, and stream_receive_window, which never changes within the lifetime of a connection. An accounting error in bytes_read isn't impossible, but I'm not seeing one.

fl0rek commented 1 month ago

Hey @Ralith, thanks for the response!

At the time I wasn't capturing quinn logs to save space and all I have there is occasional WARN quinn_proto::connection: discarding possible duplicate packet. I am trying to reproduce the panic again with more logging enabled, but so far it seems like a very rare occurance and I did not manage to yet.

Ralith commented 1 month ago

Panic backtraces aren't Quinn logs; they're produced by the panic handler, which by default looks for RUST_BACKTRACE=1 in your environment per your paste. Be sure to set that.

Might be worth investigating hooking a smart fuzzer up to the stream bookkeeping directly, if you've got no idea at all what triggered this.

Rexagon commented 2 weeks ago

I caught an 'attempt to substract with overflow' panic here https://github.com/quinn-rs/quinn/blob/db2df614fcab3e9c17b3e5f325eb197920489779/quinn-proto/src/connection/streams/recv.rs#L109 while doing some ridiculous tests with 30 local endpoints communicating with each other in debug build.

Unfortunately there are too many logs to attach, but apparently a lot of packets are lost and something goes beyond the window.

Ralith commented 2 weeks ago

Did you get a backtrace?