open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.2k stars 758 forks source link

Sampling using TraceIdRatioBasedSampler and context propagation - best practices? #2887

Open unwork-ag opened 2 years ago

unwork-ag commented 2 years ago

Question

Describe your environment.

We are building a distributed system based on Asp.NET Core services and messaging using RabbitMQ. I have configured OpenTelemetry instrumentation for AspNetCore, HttpClient and two additional sources (one for activities in the service and one for message send/receive activities). I configured the built-in TraceIdRatioBasedSampler with 0.0 as the ratio (for testing). In addition I created a custom ActivityListener which is essentially listening to the same sources as OpenTelemetry, uses the same sampling and returns ActivitySamplingResult.None for TraceIds that are filtered out.

What are you trying to achieve?

As far as I understand it is required that the TraceId is routed through all involved systems to ensure that sampling can always make the same decision and no loose spans are reported. This requires context propagation which to some extent relies on the Activity.Current property being set (e.g. when I call into a library that creates activities). However, I observed that ActivitySource.StartActivity returns null in an unpredictable way. Let me explain:

                       var parentContext = ea.BasicProperties.ExtractParentContext(); // default context in this case
                       var activityName = $"{ea.Exchange}:{ea.RoutingKey} receive";
                       using (var activity = ActivitySource.StartActivity(activityName, ActivityKind.Consumer,
                                 parentContext.ActivityContext)) 
                        {
                            // activity is not null in this case
                            if (activity != null)
                            {
                                message.InjectParentContext(activity);
                            }

...
                        }
                       var parentContext = message.ExtractParentContext();

                        using (var activity = ActivitySource.StartActivity(
                                   "myActivity", ActivityKind.Internal,
                                   parentContext.ActivityContext)) 
                        {
                             // activity is null here - propagation of context is broken
                        }

What is the proper way to do this?

Additional Context

I know that this could also be related to (mis-?)behavior in the System.Diagnostics library (or any of the built-in instrumentations) but I considered this a more appropriate place.

xontab commented 2 years ago

I am encountering a similar issue that when the sampler is dropping the trace and the ActivitySource.StartActivity returns null. Thus the trace id and other context properties are lost. Is there a better way to still enable the context propagation but limit the traces that are exported?

mahaisong commented 2 years ago

I have a similar problem

github-actions[bot] commented 2 weeks ago

This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.

cfbao commented 2 weeks ago

I was having the same problem, and noticing some popular packages in https://github.com/open-telemetry/opentelemetry-dotnet-contrib (e.g. AWS's) not handling this aspect correctly and ending up with disconnected spans.

However, I saw this note on context propagation today: https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Api/README.md#context-propagation where it asks people to follow the messaging example for context propagation.

In this "messaging example", there's this section of code with comment: https://github.com/open-telemetry/opentelemetry-dotnet/blob/c8a8913152eb4a983df485f583a22ed706c0cbf4/examples/MicroserviceExample/Utils/Messaging/MessageSender.cs#L46-L59

Depending on Sampling (and whether a listener is registered or not), the activity above may not be created. If it is created, then propagate its context. If it is not created, the propagate the Current context, if any.

From what I can tell, this should always work. ActivitySource.StartActivity always creates a non-null activity if it's sampled or if there's no existing Current activity.


This is the closest to an official recommendation I can find. I do think this should be highlighted in a more prominent place in the documentation though.