Open mladedav opened 6 months ago
As an aside, I could not find any mention of these fields anywhere in OpenTelemetry spec. So I assume this is something that this library itself provides but is not standard, correct?
Yes. In OpenTelemetry, this most likely belongs to Profiling.
Bug Report
Version
v0.23.0
This also affects all previous versions.
Description
When timing information is computed (
idle_ns
andbusy_ns
) the decision of whether the last interval was idle or busy is determined solely on the operation that is happening right now. So if we are entering a span, we assume the span has been idle until now, if we're exiting we assume it has been busy.This holds only if we alternate between entering and exiting. If we enter twice, the time between the two enters is considered idle as far as the timings are concrented.
Example test illustrating the issue
```rust #[test] fn timings_test() { let tracer = TestTracer(Arc::new(Mutex::new(None))); let subscriber = tracing_subscriber::registry().with(layer().with_tracer(tracer.clone())); tracing::subscriber::with_default(subscriber, || { let span = tracing::debug_span!("request", otel.status_code = ?otel::Status::Ok); let _guard_1 = span.enter(); thread::sleep(std::time::Duration::from_millis(100)); let _guard_2 = span.enter(); }); let attributes = tracer.with_data(|data| data.builder.attributes.as_ref().unwrap().clone()); let Value::I64(idle) = attributes.iter().find_map(|key_value| (key_value.key.as_str() == "idle_ns").then_some(key_value.value.clone())).unwrap() else { panic!() }; let Value::I64(busy) = attributes.iter().find_map(|key_value| (key_value.key.as_str() == "busy_ns").then_some(key_value.value.clone())).unwrap() else { panic!() }; let idle_ms = idle / 1_000_000; let busy_ms = busy / 1_000_000; println!("Idle: {} ms", idle_ms); println!("Busy: {} ms", busy_ms); // These panic but should not! assert!(idle_ms < 50); // Should be just about 0 assert!(busy_ms > 50); // Should be just about 100 } ```This can be problematic especially in parallel environments where the spans can be entered concurrently from multiple threads.
As an aside, I could not find any mention of these fields anywhere in OpenTelemetry spec. So I assume this is something that this library itself provides but is not standard, correct?
The question really is whether there is any kind of decision what the numbers should represent, currently I assume it's wall-time (so if an operation is running on 8 threads for 100 ms,
busy
time should still be just 100 ms not 800 ms).I assume it would be too much of a breaking change to remove it now, but is there a reason it is here instead of
tracing
itself or some other library?