Open cijothomas opened 1 week ago
is this open to be worked on @cijothomas ?
@RichardChukwu thanks for volunteering to fix the issue. Do you have any approach in mind - if we can discuss it here? The tonic crate used for gRPC seems to require the runtime in all scenarios, while SimpleExporter is supposed to work with the futures crate.
p.s. - @cijothomas is traveling so there could be a delay in response, assigning it to you for now.
The problem is that libraries in the exporter network stack enter tracing
spans of their own (normally on debug or tracing level), these go to the global subscriber, and with the simple exporter this leads to a worker thread calling OpenTelemetry tracer while a lock is held in the exporter by the main thread.
Try to prevent the reentrancy in tracing by filtering the OpenTelemetry layer with e.g.
use tracing_subscriber::filter;
let layer = OpenTelemetryTracingBridge::new(&logger_provider)
.with_filter(
filter::Targets::new().with_target("my-target", Level::INFO)
);
Panics with
there is no reactor running, must be called from the context of a Tokio 1.x runtime
Yes, opentelemetry-otlp tonic exporter requires a Tokio runtime context, regardless of the choice of simple vs. batch. And you likely want the batch processor with the networked exporter anyway.
@mzabaluev would you say the Tokio runtime needs to be initialized properly to handle async tasks then?
would you say the Tokio runtime needs to be initialized properly to handle async tasks then?
Yes, putting the initialization into the tokio runtime context will solve the panic.
@cijothomas I see you mentioned that exporter hangs with tokio::main
. Was it because of the infinite loop scenario similar to #1745 , because otherwise the below code works fine for me:
fn init_logs() -> Result<opentelemetry_sdk::logs::LoggerProvider, LogError> {
let tonic_exporter = opentelemetry_otlp::new_exporter()
.tonic()
.with_endpoint("http://localhost:4317");
let res = opentelemetry_otlp::new_pipeline()
.logging()
.with_resource(RESOURCE.clone())
.with_exporter(
tonic_exporter,
)
.install_simple();
//.install_batch(opentelemetry_sdk::runtime::Tokio);
if let Err(ref err) = res {
println!("Error initializing logs: {:?}", err);
}
res
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
let logger_provider = init_logs().unwrap();
// Create a new OpenTelemetryTracingBridge using the above LoggerProvider.
let layer = OpenTelemetryTracingBridge::new(&logger_provider);
let filter = EnvFilter::new("info")
.add_directive("hyper=error".parse().unwrap())
.add_directive("tonic=error".parse().unwrap())
.add_directive("reqwest=error".parse().unwrap());
tracing_subscriber::registry()
.with(filter)
.with(layer)
.init();
info!(name: "my-event", target: "my-target", "hello from {}. My price is {}", "apple", 1.99);
logger_provider.shutdown()?;
}
Need to modify exporters in such a way that they don't require async runtimes, and can do normal blocking calls (assuming there are APIs to do that).
tonic
has a hard dependency on the tokio
runtime and it does not appear to support blocking gRPC calls. As @TommyCpp mentioned in last community meeting, we previously supported grpc-io, which does not rely on any runtime (it uses futures), but this was removed as part of #1534 due to the library's poor maintenance—the last release was over a year ago.
The problem is that libraries in the exporter network stack enter
tracing
spans of their own (normally on debug or tracing level), these go to the global subscriber, and with the simple exporter this leads to a worker thread calling OpenTelemetry tracer while a lock is held in the exporter by the main thread. Try to prevent the reentrancy in tracing by filtering the OpenTelemetry layer with e.g.use tracing_subscriber::filter; let layer = OpenTelemetryTracingBridge::new(&logger_provider) .with_filter( filter::Targets::new().with_target("my-target") );
You are right! Added https://github.com/open-telemetry/opentelemetry-rust/pull/2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.
@cijothomas I see you mentioned that exporter hangs with tokio::main. Was it because of the infinite loop scenario similar to https://github.com/open-telemetry/opentelemetry-rust/issues/1745
Yes exactly. Its deadlock caused by reentrancy in SimpleLogProcessor.
You are right! Added #2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.
I thought about that, but I doubt the tracing
spans created in the exporter stack will always get the right Context
. The thread where this happens may be a worker thread that doesn't have an inherited Context, and the OpenTelemetry layer does not have visibility on the causal relationships between export and any tracing caused by it.
You are right! Added #2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.
I thought about that, but I doubt the
tracing
spans created in the exporter stack will always get the rightContext
. The OpenTelemetry layer may not have visibility on these due to whatever filtering the application has set up, and the thread where this happens may be a worker thread that doesn't have an inherited Context.
Spot on! That is the reason why solving the "Context" problem is important for OTel! It is non-trivial and @lalitb has done a lot of research into this, and came to same conclusion as you shared!
This is something we'll need to solve, most likely, after initial stable release of logs and metrics.
SimpleExporter is not useable with OTLP/gRPC (Tonic client used for gRPC) Minimal repro:
Panics with
If making the
main
astokio::main
, then export hangs forever.Need to modify exporters in such a way that they don't require async runtimes, and can do normal blocking calls (assuming there are APIs to do that).