tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.5k stars 722 forks source link

follows_from has a hard dependency on the parent #1121

Open maximebedard opened 3 years ago

maximebedard commented 3 years ago

Bug Report

Hi friends. I'm running into an issue when trying to model a casual relationship. I've updated the open telemetry example based off of 302d4a92dfee951736bd1a3663224694e2cd73cd to try and visualize it:

diff --git a/examples/examples/opentelemetry.rs b/examples/examples/opentelemetry.rs
index b7e522ef..199d1f7f 100644
--- a/examples/examples/opentelemetry.rs
+++ b/examples/examples/opentelemetry.rs
@@ -1,5 +1,5 @@
 use std::{error::Error, thread, time::Duration};
-use tracing::{span, trace, warn};
+use tracing::{span, trace, warn, Span};
 use tracing_attributes::instrument;
 use tracing_subscriber::prelude::*;

@@ -7,7 +7,18 @@ use tracing_subscriber::prelude::*;
 #[inline]
 fn expensive_work() -> &'static str {
     span!(tracing::Level::INFO, "expensive_step_1")
-        .in_scope(|| thread::sleep(Duration::from_millis(25)));
+        .in_scope(|| {
+            let follower = span!(tracing::Level::INFO, "expensive_step_3");
+            follower.follows_from(&Span::current());
+
+            thread::spawn(move || {
+                let _follower = follower.enter();
+                thread::sleep(Duration::from_millis(40));
+            });
+
+            thread::sleep(Duration::from_millis(25))
+        });
+
     span!(tracing::Level::INFO, "expensive_step_2")
         .in_scope(|| thread::sleep(Duration::from_millis(25)));

This result in the following trace:

image

By looking at the trace, I am expecting the expensive_step_1 to terminate before the expensive_step_3, but still see expensive_step_3 as a child span of expensive_step_1.

I can reproduce this expectation using the ruby otel client:

image
Ruby code to reproduce ```ruby # chruby 2.7.1 # gem install opentelemetry-api # gem install opentelemetry-sdk # gem install opentelemetry-exporter-jaeger require "logger" require "opentelemetry/sdk" require "opentelemetry/exporter/jaeger" require "pry-byebug" OpenTelemetry::SDK.configure do |c| c.add_span_processor( OpenTelemetry::SDK::Trace::Export::SimpleSpanProcessor.new( OpenTelemetry::Exporter::Jaeger::AgentExporter.new, ) ) c.service_name = "ruby-example" c.service_version = "0.6.0" c.logger = Logger.new(STDOUT) end # To start a trace you need to get a Tracer from the TracerProvider tracer = OpenTelemetry.tracer_provider.tracer("my_app_or_gem", "0.1.0") # create a span tracer.in_span("app_start") do |_| tracer.in_span("expensive_work") do |_| tracer.in_span("expensive_work_1") do |span, ctx| Thread.new(ctx) do |c| tracer.in_span("expensive_work_3", with_parent: c) do sleep(0.40) end end sleep(0.25) end tracer.in_span("expensive_work_2") do |_| sleep(0.25) end end end ```

Version

Based off of 302d4a92dfee951736bd1a3663224694e2cd73cd.

Platform

Tested on both OSX and a ubuntu VM.

Darwin MacBook-Pro-2.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64 x86_64

Linux ubuntu-focal 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Description

My understanding is that it could be caused by the fact that Spans are reference counted in the registry. Since the following span still has a reference to the parent, the parent cannot close before the child span. I'm not 100% sure of how it all works in details, but I'd be willing to learn and contribute to a fix if someone has an idea where to start :)

Thanks!

LehMaxence commented 3 years ago

I also think that the follows_from support provided through the OpenTelemetryLayer is flawed and do not satisfy the Opentelemetry specifications for linked spans.

The on_follows_from function of the layer re-builds the followed otel span from its Id, just to get its context and create a Link that is added to the span that follows from. Re-building the otel span requires that it exists and is open in the current layer context. That creates the hard dependency between followed span and span that follows from.

My opinion is that the OpenTelemetryLayer should not (and cannot) implement the on_follows_from function, and instead tracing::Span should be extended, through the OpenTelemetrySpanExt trait, with a fn set_follows_from(&self, cx: Context) function.

@jtescher may want to comment on this issue.

jtescher commented 3 years ago

@LehMaxence so it can under some circumstances, but you are correct that it should not assume that the span is always available as it currently does. The extension is a more dependable API in that you don't have to ensure the span you are linking is still in the registry.

There are advantages to having the otel layer do a best effort link for people who are using the tracing api through a level of indirection or otherwise do not have direct control, but it should really be updated to warn in dev rather than panic if the tracing span with the linked context is no longer available.