Open unwork-ag opened 2 years ago
I am encountering a similar issue that when the sampler is dropping the trace and the ActivitySource.StartActivity
returns null
. Thus the trace id and other context properties are lost. Is there a better way to still enable the context propagation but limit the traces that are exported?
I have a similar problem
This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.
I was having the same problem, and noticing some popular packages in https://github.com/open-telemetry/opentelemetry-dotnet-contrib (e.g. AWS's) not handling this aspect correctly and ending up with disconnected spans.
However, I saw this note on context propagation today: https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Api/README.md#context-propagation where it asks people to follow the messaging example for context propagation.
In this "messaging example", there's this section of code with comment: https://github.com/open-telemetry/opentelemetry-dotnet/blob/c8a8913152eb4a983df485f583a22ed706c0cbf4/examples/MicroserviceExample/Utils/Messaging/MessageSender.cs#L46-L59
Depending on Sampling (and whether a listener is registered or not), the activity above may not be created. If it is created, then propagate its context. If it is not created, the propagate the Current context, if any.
From what I can tell, this should always work.
ActivitySource.StartActivity
always creates a non-null activity if it's sampled or if there's no existing Current
activity.
This is the closest to an official recommendation I can find. I do think this should be highlighted in a more prominent place in the documentation though.
Question
Describe your environment.
We are building a distributed system based on Asp.NET Core services and messaging using RabbitMQ. I have configured OpenTelemetry instrumentation for AspNetCore, HttpClient and two additional sources (one for activities in the service and one for message send/receive activities). I configured the built-in TraceIdRatioBasedSampler with 0.0 as the ratio (for testing). In addition I created a custom ActivityListener which is essentially listening to the same sources as OpenTelemetry, uses the same sampling and returns
ActivitySamplingResult.None
for TraceIds that are filtered out.What are you trying to achieve?
As far as I understand it is required that the TraceId is routed through all involved systems to ensure that sampling can always make the same decision and no loose spans are reported. This requires context propagation which to some extent relies on the Activity.Current property being set (e.g. when I call into a library that creates activities). However, I observed that ActivitySource.StartActivity returns null in an unpredictable way. Let me explain:
ActivitySamplingResult.PropagationData
in case of filtered trace ids. This ensures that activities are always created - but in the end this doesn't look right to me.What is the proper way to do this?
Additional Context
I know that this could also be related to (mis-?)behavior in the System.Diagnostics library (or any of the built-in instrumentations) but I considered this a more appropriate place.