open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.81k stars 417 forks source link

[Bug]: Using `futures_executor::block_on` blocks forever in WASM #2047

Open mendess opened 3 weeks ago

mendess commented 3 weeks ago

What happened?

Using this library in WASM causes the program to block forever and use 100% CPU, due to the usage of futures_executor::block_on

API Version

0.24.0

SDK Version

0.24.1

What Exporter(s) are you seeing the problem on?

N/A

Using futures_executor::block_on is harmful to performance and correctness.

In single-threaded contexts, blocking in place leads to blocking the entire runtime forever, and block_on causes that thread to busy loop until the future resolves, using 100% of CPU.

In a multithreaded environment it is also harmful as it blocks an entire thread, and if that thread happens to have other tasks scheduled on it, this will cause those tasks to be starved of resources, worse still if the future the thread is blocking on is waiting for a task that's scheduled on that same thread via a mechanism such as the tokio LIFO slot, leading to a deadlock.

See this draft PR as a first attempt to address this problem.

In order to make this work for WASM I introduced a spawn_future function, which does the right thing in WASM but continues using the bad futures_executor::block_on function in other environments. With this patch I stopped having problems in the WASM project I'm working on, thus proving that this was the root of the issue.

Next I made some more methods async. The trait methods I made async are:

This avoids having to call spawn_future inside these functions. I couldn't however avoid calling this inside the Drop implementations that needed to call SpanProcessor::shutdown, as well as, inside periodic_reader.rs because I was unsure as to how best to proceed. So there are still some open questions.

cijothomas commented 3 weeks ago

Thanks for opening the issue with detailed analysis and the PR! I left a comment on the PR https://github.com/open-telemetry/opentelemetry-rust/pull/2048/files#r1729165305 .

We need to see what is the right way to execute all the backgrounds tasks (batch exporting, metric reader). @ThomsonTan was also looking into this. Tagging @TommyCpp and @lalitb also to share ideas/concerns.