Open whitelilis opened 2 years ago
Now can see oneshot channel, but can't see mpsc channel
Yea, this is definitely desired. It'd mostly require instrumentation in the Tokio crate, and then the console would just "see it". Would you be interested in helping add this?
If @whitelilis doesn't call dibs I'd be willing to help with this.
I believe @zaharidichev previously added instrumentation to the Tokio MPSC channels, but it had to be removed because the console-subscriber
crate uses a MPSC internally, so it caused a deadlock. In order to put the MPSC instrumentation back, we'll need to find a workaround for this
(the problem is specifically the register_callsite
calls for the MPSC instrumentation causing the use of a MPSC channel. we could probably work around this by special casing those callsites to be sent separately, outside of the register_callsite
call...)
Hi @hawkw,
I've been playing with this locally and have a working prototype that records various stats about an (un)bounded mpsc channel:
I've also run into the deadlock you describe when switching console-subscriber
to use the tokio
patched to emit the spans too. Two attempts are made to acquire the registry lock in tracing-core, before the code has a chance to call the register_callsite()
in console-subscriber
:(
I'm happy to try and push this forwards but I'm not sure how best work around this - any pointers? Below is the metadata for the mpsc
captured in the tracing::MacroCallsite::interest()
call that eventually deadlocks:
[/Users/dom/Documents/rust/tracing/tracing/src/lib.rs:1044] self = MacroCallsite {
interest: 57005,
meta: Metadata {
name: "event /Users/dom/Documents/rust/tokio/tokio/src/sync/mpsc/chan.rs:212",
target: "runtime::resource::state_update",
level: Level(
Trace,
),
module_path: "tokio::sync::mpsc::chan",
location: /Users/dom/Documents/rust/tokio/tokio/src/sync/mpsc/chan.rs:212,
fields: {tx, tx.op},
callsite: Identifier(0x1006eaa08),
kind: Kind(EVENT),
},
registration: Once { .. },
}
@domodowyer
I've also run into the deadlock you describe when switching
console-subscriber
to use thetokio
patched to emit the spans too. Two attempts are made to acquire the registry lock in tracing-core, before the code has a chance to call theregister_callsite()
inconsole-subscriber
:(
I'm working on an upstream change to fix this: https://github.com/tokio-rs/tracing/pull/2020
If you have the time, it would be very helpful to try using a cargo patch to test the changes in that PR against your console branch, and make sure everything works.
Hi @hawkw,
I had a play around with the eliza/fix-register-deadlock
branch, but unfortunately there is still a deadlock during init.
I have attached a full backtrace below, but it appears that while the Dispatch
is being initialised (from ConsoleLayer::init()
) it rebuilds the callsite interests while holding the mutex, and that eventually causes a register()
call to deadlock on the same mutex:
ConsoleLayer::init()
tracing_subscriber::init()
calls Dispatch::new()
Dispatch::new()
calls callsite::register_dispatch()
which obtains the registry lockrebuild_interest()
try_register
console_subscriber::ConsoleLayer::register_callsite()
to send on the mpsctrace!
and causes a call to event!
, which in turnMacroCallsite::interest()
-> register()
-> try_register()
try_register()
and therefore IS_REGISTERING == false
Callsite::register()
and it deadlocks waiting for the lock held in bullet 3Thanks for your work on this - I'm happy to help however I can 👍
I've pushed my tokio instrumentation changes and pre-emptively opened a PR, using that branch gives us a simple reproducer:
use std::time::Duration;
#[tokio::main(flavor = "current_thread")]
async fn main() {
console_subscriber::init();
tokio::spawn(async {
tokio::time::sleep(Duration::from_secs(100000)).await;
});
}
and a patch in the Cargo.toml
file to use the mpsc-instrumented tokio:
[patch.crates-io]
tokio = { git = "https://github.com/domodwyer/tokio", branch = "dom/trace-mpsc" }
The deadlock should be reproducible with a simple cargo run
resulting in a similar backtrace as above, and attaching the console fails.
I hope this helps - I took a look at the code but I'm not familiar enough with the tracing internals to come up with a nice solution, but I'm happy to help however I can :+1:
The resource now only show semaphore and timer, it is very helpful to show channel metrics, such as how many message in/out/drop/still_in_channel.