Visualizing multiple concurrent async tasks

sergiimk commented 1 year ago

I have a tokio-based CLI program that creates a root span upon start and then processes multiple files concurrently using the code like:

#[tracing::intrument]
async fn do_stuff(...) { ... }

let futures: Vec<_> = files
    .into_iter()
    .map(|f| do_stuff(f))
    .collect();

let results = futures::future::join_all(futures).await;

I configure the layer with TraceStyle::Async, but even then the resulting graph in Perfetto looks as if all the do_stuff spans are mushed together into one bar instead of being treated as independent concurrent tasks.

Looking at the trace file, all do_stuff events seem to have "id": 1, so Perfetto likely unable to tell them apart.

Is this expected / current limitation / or perhaps I'm doing something wrong?

thoren-d commented 1 year ago

I think I am able to reproduce what you're seeing, though to me I see a big stack of do_stuff as if it was a weird recursive call.

Getting the id is working as intended, as I chose to fetch the root id (as seen here). The question is, is there a better/more intuitive behavior?

If I set "id" to the root id, you can see the parent-child relationship between async spans with them stacked on each other. In this case, each independent task needs to be a root span to display correctly.

The only other thing I can think to supply for "id" is the unique id for the span. This would show every span as an independent task, and lose the parent-child stacking relationships.

I'm open to more ideas but at the moment I don't see a clearly better behavior. I'll give it some thought when I can, though.

sergiimk commented 1 year ago

Thanks for your reply @thoren-d.

Firstly I made this tiny app to reproduce the issue:

Cargo.toml

```rust [package] name = "test-profile" version = "0.1.0" edition = "2021" [dependencies] tracing = "0.1" tracing-chrome = "0.7" tracing-subscriber = { version = "0.3", default-features = false, features = ["env-filter"] } tokio = { version = "1", features = ["full"] } ```

main.rs

```rust use std::time::Duration; use tracing::Instrument; #[tokio::main] async fn main() { let _g = configure_tracing(); root().await; } ///////////////////////////////////////////////////////////////////////////////////////// #[tracing::instrument] async fn root() { tracing::info!("Started"); serial().await; tokio::time::sleep(Duration::from_millis(100)).await; concurrent().await; tokio::time::sleep(Duration::from_millis(100)).await; parallel().await; tokio::time::sleep(Duration::from_millis(100)).await; parallel_linked().await; tracing::info!("Finished"); } ///////////////////////////////////////////////////////////////////////////////////////// #[tracing::instrument] async fn serial() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(50)).await; serial_1().await; serial_2().await; } #[tracing::instrument] async fn serial_1() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; } #[tracing::instrument] async fn serial_2() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; serial_2_1().await; serial_2_2().await; } #[tracing::instrument] async fn serial_2_1() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; } #[tracing::instrument] async fn serial_2_2() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(200)).await; } ///////////////////////////////////////////////////////////////////////////////////////// #[tracing::instrument] async fn concurrent() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; let f1 = concurrent_1(); let f2 = concurrent_2(); tokio::join!(f1, f2); } #[tracing::instrument] async fn concurrent_1() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(50)).await; concurrent_1_1().await; concurrent_1_2().await; } #[tracing::instrument] async fn concurrent_1_1() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(50)).await; } #[tracing::instrument] async fn concurrent_1_2() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(50)).await; } #[tracing::instrument] async fn concurrent_2() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(200)).await; } ///////////////////////////////////////////////////////////////////////////////////////// #[tracing::instrument] async fn parallel() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; let f1 = tokio::task::spawn(parallel_1()); let f2 = tokio::task::spawn(parallel_2()); let (r1, r2) = tokio::join!(f1, f2); r1.unwrap(); r2.unwrap(); } #[tracing::instrument] async fn parallel_1() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; parallel_1_1().await; parallel_1_2().await; } #[tracing::instrument] async fn parallel_1_1() { tokio::time::sleep(Duration::from_millis(100)).await; } #[tracing::instrument] async fn parallel_1_2() { tokio::time::sleep(Duration::from_millis(200)).await; } #[tracing::instrument] async fn parallel_2() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(200)).await; parallel_2_1().await; parallel_2_2().await; } #[tracing::instrument] async fn parallel_2_1() { tokio::time::sleep(Duration::from_millis(100)).await; } #[tracing::instrument] async fn parallel_2_2() { tokio::time::sleep(Duration::from_millis(200)).await; } ///////////////////////////////////////////////////////////////////////////////////////// #[tracing::instrument] async fn parallel_linked() { std::thread::sleep(Duration::from_millis(10)); tokio::time::sleep(Duration::from_millis(100)).await; let f1 = tokio::task::spawn(parallel_1().instrument(tracing::info_span!("task1").or_current())); let f2 = tokio::task::spawn(parallel_2().instrument(tracing::info_span!("task2").or_current())); let (r1, r2) = tokio::join!(f1, f2); r1.unwrap(); r2.unwrap(); } ///////////////////////////////////////////////////////////////////////////////////////// fn configure_tracing() -> tracing_chrome::FlushGuard { use tracing_subscriber::fmt::format::FmtSpan; use tracing_subscriber::layer::SubscriberExt; use tracing_subscriber::EnvFilter; // Use configuration from RUST_LOG env var if provided let env_filter = EnvFilter::from_default_env(); // Configure Perfetto tracing let (perfetto_layer, perfetto_guard) = tracing_chrome::ChromeLayerBuilder::new() .file("trace.json") .trace_style(tracing_chrome::TraceStyle::Async) .include_locations(true) .include_args(true) .build(); let fmt_layer = tracing_subscriber::fmt::layer() .with_span_events(FmtSpan::NEW | FmtSpan::CLOSE) .with_writer(std::io::stderr) .pretty(); let subscriber = tracing_subscriber::registry() .with(env_filter) .with(perfetto_layer) .with(fmt_layer); tracing::subscriber::set_global_default(subscriber).expect("Failed to set subscriber"); perfetto_guard } ```

Which generates trace like this:

trace.json

```json [ {"args":{"name":"main"},"name":"thread_name","ph":"M","pid":1,"tid":0}, {".file":"src/main.rs",".line":13,"cat":"profile","id":1,"name":"root","ph":"b","pid":1,"tid":0,"ts":229.007}, {".file":"src/main.rs",".line":15,"args":{"message":"Started"},"cat":"profile","name":"event src/main.rs:15","ph":"i","pid":1,"s":"t","tid":0,"ts":326.853}, {".file":"src/main.rs",".line":36,"cat":"profile","id":1,"name":"serial","ph":"b","pid":1,"tid":0,"ts":353.463}, {".file":"src/main.rs",".line":44,"cat":"profile","id":1,"name":"serial_1","ph":"b","pid":1,"tid":0,"ts":62012.943}, {".file":"src/main.rs",".line":44,"cat":"profile","id":1,"name":"serial_1","ph":"e","pid":1,"tid":0,"ts":172994.311}, {".file":"src/main.rs",".line":50,"cat":"profile","id":1,"name":"serial_2","ph":"b","pid":1,"tid":0,"ts":173246.365}, {".file":"src/main.rs",".line":58,"cat":"profile","id":1,"name":"serial_2_1","ph":"b","pid":1,"tid":0,"ts":285098.573}, {".file":"src/main.rs",".line":58,"cat":"profile","id":1,"name":"serial_2_1","ph":"e","pid":1,"tid":0,"ts":397605.535}, {".file":"src/main.rs",".line":64,"cat":"profile","id":1,"name":"serial_2_2","ph":"b","pid":1,"tid":0,"ts":397842.993}, {".file":"src/main.rs",".line":64,"cat":"profile","id":1,"name":"serial_2_2","ph":"e","pid":1,"tid":0,"ts":609086.196}, {".file":"src/main.rs",".line":50,"cat":"profile","id":1,"name":"serial_2","ph":"e","pid":1,"tid":0,"ts":609302.422}, {".file":"src/main.rs",".line":36,"cat":"profile","id":1,"name":"serial","ph":"e","pid":1,"tid":0,"ts":609403.132}, {".file":"src/main.rs",".line":72,"cat":"profile","id":1,"name":"concurrent","ph":"b","pid":1,"tid":0,"ts":711072.882}, {".file":"src/main.rs",".line":81,"cat":"profile","id":1,"name":"concurrent_1","ph":"b","pid":1,"tid":0,"ts":822447.591}, {".file":"src/main.rs",".line":101,"cat":"profile","id":1,"name":"concurrent_2","ph":"b","pid":1,"tid":0,"ts":832822.879}, {".file":"src/main.rs",".line":89,"cat":"profile","id":1,"name":"concurrent_1_1","ph":"b","pid":1,"tid":0,"ts":884057.765}, {".file":"src/main.rs",".line":89,"cat":"profile","id":1,"name":"concurrent_1_1","ph":"e","pid":1,"tid":0,"ts":954710.601}, {".file":"src/main.rs",".line":95,"cat":"profile","id":1,"name":"concurrent_1_2","ph":"b","pid":1,"tid":0,"ts":954955.322}, {".file":"src/main.rs",".line":95,"cat":"profile","id":1,"name":"concurrent_1_2","ph":"e","pid":1,"tid":0,"ts":1016905.619}, {".file":"src/main.rs",".line":81,"cat":"profile","id":1,"name":"concurrent_1","ph":"e","pid":1,"tid":0,"ts":1017070.931}, {".file":"src/main.rs",".line":101,"cat":"profile","id":1,"name":"concurrent_2","ph":"e","pid":1,"tid":0,"ts":1167238.38}, {".file":"src/main.rs",".line":72,"cat":"profile","id":1,"name":"concurrent","ph":"e","pid":1,"tid":0,"ts":1167446.993}, {".file":"src/main.rs",".line":113,"cat":"profile","id":1,"name":"parallel","ph":"b","pid":1,"tid":0,"ts":1269458.124}, {"args":{"name":"tokio-runtime-worker"},"name":"thread_name","ph":"M","pid":1,"tid":1}, {".file":"src/main.rs",".line":124,"cat":"profile","id":274877906945,"name":"parallel_1","ph":"b","pid":1,"tid":1,"ts":1381036.907}, {"args":{"name":"tokio-runtime-worker"},"name":"thread_name","ph":"M","pid":1,"tid":2}, {".file":"src/main.rs",".line":142,"cat":"profile","id":549755813889,"name":"parallel_2","ph":"b","pid":1,"tid":2,"ts":1381078.532}, {"args":{"name":"tokio-runtime-worker"},"name":"thread_name","ph":"M","pid":1,"tid":3}, {".file":"src/main.rs",".line":132,"cat":"profile","id":274877906945,"name":"parallel_1_1","ph":"b","pid":1,"tid":3,"ts":1492030.009}, {".file":"src/main.rs",".line":150,"cat":"profile","id":549755813889,"name":"parallel_2_1","ph":"b","pid":1,"tid":3,"ts":1591861.279}, {".file":"src/main.rs",".line":132,"cat":"profile","id":274877906945,"name":"parallel_1_1","ph":"e","pid":1,"tid":3,"ts":1593231.339}, {".file":"src/main.rs",".line":137,"cat":"profile","id":274877906945,"name":"parallel_1_2","ph":"b","pid":1,"tid":3,"ts":1593388.969}, {".file":"src/main.rs",".line":150,"cat":"profile","id":549755813889,"name":"parallel_2_1","ph":"e","pid":1,"tid":3,"ts":1693170.722}, {".file":"src/main.rs",".line":155,"cat":"profile","id":549755813889,"name":"parallel_2_2","ph":"b","pid":1,"tid":3,"ts":1693406.434}, {".file":"src/main.rs",".line":137,"cat":"profile","id":274877906945,"name":"parallel_1_2","ph":"e","pid":1,"tid":3,"ts":1795415.189}, {".file":"src/main.rs",".line":124,"cat":"profile","id":274877906945,"name":"parallel_1","ph":"e","pid":1,"tid":3,"ts":1795613.676}, {".file":"src/main.rs",".line":155,"cat":"profile","id":549755813889,"name":"parallel_2_2","ph":"e","pid":1,"tid":3,"ts":1895145.47}, {".file":"src/main.rs",".line":142,"cat":"profile","id":549755813889,"name":"parallel_2","ph":"e","pid":1,"tid":3,"ts":1895296.256}, {".file":"src/main.rs",".line":113,"cat":"profile","id":1,"name":"parallel","ph":"e","pid":1,"tid":0,"ts":1895428.184}, {".file":"src/main.rs",".line":162,"cat":"profile","id":1,"name":"parallel_linked","ph":"b","pid":1,"tid":0,"ts":1997681.801}, {".file":"src/main.rs",".line":166,"cat":"profile","id":1,"name":"task1","ph":"b","pid":1,"tid":0,"ts":2108798.729}, {".file":"src/main.rs",".line":167,"cat":"profile","id":1,"name":"task2","ph":"b","pid":1,"tid":0,"ts":2109001.476}, {".file":"src/main.rs",".line":124,"cat":"profile","id":1,"name":"parallel_1","ph":"b","pid":1,"tid":3,"ts":2109097.367}, {".file":"src/main.rs",".line":142,"cat":"profile","id":1,"name":"parallel_2","ph":"b","pid":1,"tid":1,"ts":2109285.308}, {"args":{"name":"tokio-runtime-worker"},"name":"thread_name","ph":"M","pid":1,"tid":4}, {".file":"src/main.rs",".line":132,"cat":"profile","id":1,"name":"parallel_1_1","ph":"b","pid":1,"tid":4,"ts":2221601.188}, {".file":"src/main.rs",".line":150,"cat":"profile","id":1,"name":"parallel_2_1","ph":"b","pid":1,"tid":4,"ts":2321896.199}, {".file":"src/main.rs",".line":132,"cat":"profile","id":1,"name":"parallel_1_1","ph":"e","pid":1,"tid":4,"ts":2323349.998}, {".file":"src/main.rs",".line":137,"cat":"profile","id":1,"name":"parallel_1_2","ph":"b","pid":1,"tid":4,"ts":2323579.145}, {".file":"src/main.rs",".line":150,"cat":"profile","id":1,"name":"parallel_2_1","ph":"e","pid":1,"tid":4,"ts":2423620.915}, {".file":"src/main.rs",".line":155,"cat":"profile","id":1,"name":"parallel_2_2","ph":"b","pid":1,"tid":4,"ts":2423868.359}, {".file":"src/main.rs",".line":137,"cat":"profile","id":1,"name":"parallel_1_2","ph":"e","pid":1,"tid":4,"ts":2525725.143}, {".file":"src/main.rs",".line":124,"cat":"profile","id":1,"name":"parallel_1","ph":"e","pid":1,"tid":4,"ts":2525950.029}, {".file":"src/main.rs",".line":166,"cat":"profile","id":1,"name":"task1","ph":"e","pid":1,"tid":4,"ts":2526064.568}, {".file":"src/main.rs",".line":155,"cat":"profile","id":1,"name":"parallel_2_2","ph":"e","pid":1,"tid":4,"ts":2625160.558}, {".file":"src/main.rs",".line":142,"cat":"profile","id":1,"name":"parallel_2","ph":"e","pid":1,"tid":4,"ts":2625394.314}, {".file":"src/main.rs",".line":167,"cat":"profile","id":1,"name":"task2","ph":"e","pid":1,"tid":4,"ts":2625504.383}, {".file":"src/main.rs",".line":162,"cat":"profile","id":1,"name":"parallel_linked","ph":"e","pid":1,"tid":0,"ts":2625789.192}, {".file":"src/main.rs",".line":31,"args":{"message":"Finished"},"cat":"profile","name":"event src/main.rs:31","ph":"i","pid":1,"s":"t","tid":0,"ts":2625980.974}, {".file":"src/main.rs",".line":13,"cat":"profile","id":1,"name":"root","ph":"e","pid":1,"tid":0,"ts":2626051.164} ] ```

Now regarding how to fix this:

I totally agree with you that flame-graph-like stacked view is really nice, and it would be unfortunate to lose it by splitting all coroutines into their own tid ... although this is basically how tools like Jaeger visualize things:

What helps in Jaeger's case a bit is:

timelines are sorted in chronological order, unlike Perfetto that sorts threads by name
there's still hierarchy to those timelines (you can collapse/expand them by parent span)

Also want to point out that although the parallel case in my example (tokio::spawn with root spans per task) looks good - when you do a fork-join processing you most likely will want to tie the spawned tasks to a common root span (which produces invalid graph like in parallel_linked case).

In our app we also frequently do fork-join without tokio::spawn at all (concurrent case) as we don't need/want extra threads.

I'll need to think a bit more about this, but so far I see only two options:

Jaeger-style thread per span (preferably sorted in Perfetto somehow)
Detect concurrent execution in the ChromeLayer
- e.g. if a span starts that is on the same or lower depth level as the current one - we assign it its own tid

sergiimk commented 1 year ago

Perhaps Perfetto is just not the right tool for this job of displaying flows of execution flows, and only suited for profiling what CPUs are actually doing ... but I really wish to be able to do both in one tool.

They do seem to have a higher level "Flows" feature:

sergiimk commented 1 year ago

After some experimentation I ended up writing a custom layer that:

on every span enter/exit writes a synchronous begin/end events (ph: B/E)
on every span new/close writes async begin/end events (ph: b/e) with unique ID

The first one gives me a (stacked) flame graph of all coroutines that were actively running on a CPU per thread.

The second one gives a view of async jobs, with one track per timeline (similarly to Jaeger).

It's just like running with TraceStyle::Threaded and TraceStyle::Async at the same time, with latter un-stacked.

Here's how result looks like with a test app I shared above:

It's not perfect, as amount of timelines can get quite large, but it suits my app.

Perfetto devs are also considering adding nested hierarchical grouping to timelines, which would allow fully replicating Jaeger's layout.

Also there is this issue that points out that old Chrome tracing actually supported non-stackable concurrent events, so that might be another option for displaying concurrent tasks within one timeline.

thoren-d / tracing-chrome

Visualizing multiple concurrent async tasks #24