Open Matthias247 opened 3 months ago
The only thing that I know of that could cause this is if something has a reference to the span. This includes also children spans.
I believe something like this can theoretically happen if one of the async tasks spawns a long running task which would keep the parent alive. I think this can happen for example with long-running connections if they're instrumented as child spans.
Could you try running it again but collecting these timings for all spans and ideally also printing parent spans? That could tell you if this is the case because the long running task's span would end right before your own if this really is the case.
Thanks for the response! Unfortunately I can't see another span directly closing before that in my logs. I guess my Layer
might not log it since it is suppressed (there are filters like "hyper=error"
) - even though I am not sure if these filters prevent child span creation or just suppress the events in these child spans.
Is there a way how I could check in other lifecycle methods (like on_exit
) what span references exist?
I'm sure theres none from the application code given the Span there really just exists on "the stack", but it seems like some library code might add a reference.
It depends on where you're creating the filters. If they are global, the spans should not be created at all, if the filters are set only on the given layer, then the spans will still exist and the layer is just ignoring them.
I don't think you can check references to a given span. You could theoretically iterate all spans to get its children (I think we store only the parent information, not the other way around) and there is no way to tell what holds a reference to a given span.
Do you have the code publicly available?
// global filter
tracing_subscriber::registry()
.with(fmt::layer())
.with(EnvFilter::from_default_env())
.init();
// per-layer filter
tracing_subscriber::registry()
.with(
fmt::layer()
.with_filter(EnvFilter::from_default_env())
)
.init();
Bug Report
Version
Platform
Ubuntu 22.04.3 LTS
Description
I'm running an application which creates spans in an async loop. In a simplified fashion it looks like:
For handling Spans, I there is a custom
Layer
installed. ThatLayer
- among other things - prints the start time of the span as well as the elapsed time when the Span finishes (close
s). The intention is to log how long the execution of certain Spans takes.With this setup, I got the following types of logs (
us
are reformatted fors
here)The 2nd log line here is not as expected:
app_timing_start_time
andapp_timing_end_time
) the span starts at 20:05:13 and ends at 20:05:31 (18s execution time).Layer
however reports atiming_end_time
of 20:59:15 (3241s execution time).All in all it looks like the
Span
son_close
method is called with 45min delay, while theon_close
method for other Spans within this async loop is called immediately just as expected.Is there any async/deferred processing in
tracing_subscriber
that could lead to this?