Open perlun opened 1 month ago
I got another idea when writing the issue: enabling the console exporter for tracing only, and see what it emits. Here's the outcome:
No output whatsoever (when looking with docker logs -f <container>
), apart from some Microsoft.AspNetCore
and Startup.cs
-based logging on app startup.
In addition to the above logging (and some other more detailed logging in general, enabled via appsettings.Development.json
), activity tracing output is properly logged to the console:
Activity.TraceId: 9e1617832d938d791854f973cb6d2d25
Activity.SpanId: 613b6f5b1fd87116
Activity.TraceFlags: Recorded
Activity.ActivitySourceName: Microsoft.AspNetCore
Activity.DisplayName: GET api/path/to/resource-1
Activity.Kind: Server
Activity.StartTime: 2024-10-23T06:44:37.3324209Z
Activity.Duration: 00:00:00.0167533
Activity.Tags:
[...]
Will update the issue title accordingly. The problem seems to be with collection of tracing activity data in general; it's not isolated to the OTLP exporter per se.
For the record, we haven't found the exact source of this but... modifying our YARP-based reverse proxy which sits in front of the application makes things work. :exploding_head: I.e. adding basically this to our reverse proxy code:
builder.Services.AddOpenTelemetry()
.ConfigureResource(resource => resource.AddService(serviceName))
.WithTracing(tracing => tracing
.AddAspNetCoreInstrumentation()
.AddOtlpExporter()
);
...makes OTLP tracing work (with some caveats) both for the reverse proxy and for the service behind the reverse proxy. :thinking: I'm utterly perplexed by this. Could AddAspNetCoreInstrumentation()
cause the proxied traffic to be modified somehow? Some HTTP header being added/removed? I really don't know. I guess we could tpcdump
the traffic between the proxy and the API backend if we're really eager, but I don't feel so inclined at the moment. :slightly_smiling_face:
Will close this issue soon unless anyone wants to keep it open for further debugging.
@perlun, I suppose that your proxy was creating non-recorded span. Then it was propagated to the application. Keep in mind that default Sampler is ParentBased(AlwaysOn).
What is more, AspNetCore is by default instrumented - so if you do not record Activities by .AddAspNetCoreInstrumentation() it will produce non-reocrded spans.
@Kielek Ah, that makes a bit of sense, thanks for the reply. :+1: I'll readily admit that I'm pretty much of a noob when it comes to OpenTelemetry in general. I guess the ParentBased-stuff is documented somewhere?
Thank @Kielek and @cijothomas. :+1: I feel like it would be worth mentioning these semantics somewhere. At least to me/us, it was quite a bit of a gotcha. We spent literal days debugging this before we (more or less by coincidence) found that it suddenly started working when we added telemetry to our Yarp-based reverse proxy. It would be nice to help others avoiding falling into this pit.
We could add it to https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/README.md#troubleshooting, but it's a very specific problem so not sure that it properly belongs there. I guess there isn't any form of a "FAQ" for the project or similar?
Package
OpenTelemetry
Package Version
Runtime Version
net8.0
Description
Hi,
We have been debugging a really weird problem for quite some time now, so I thought I'd reach out and ask for help. The problem is with our ASP.NET Core application not producing any tracing-related data, despite being enabled in our
Startup
class.What's even more weird: the exact same application works fine when running locally (both in Docker and on the Linux host). In the deployed environment, it runs inside Docker and this is where it doesn't work.
I am not necessarily saying this is a bug in OpenTelemetry, but something in our application or elsewhere is causing the OpenTelemetry-based instrumentation to malfunction.
Steps to Reproduce
Unfortunately, I don't currently have a minimal reproducible example; in an isolated ASP.NET application based on the "getting started" example (https://opentelemetry.io/docs/languages/net/getting-started/), everything works as intended. The problem is isolated to our production code.
Setup code
We set up OpenTelemetry in a method called from
ConfigureServices
in ourStartup
class:Expected Result
Tracing data from ASP.NET being sent to the configured OTLP exporter.
Actual Result
No tracing data emitted whatsoever.
Additional Context
NuGet package references
In addition to the package versions listed above, we also tested with
1.10.0-beta.1
+1.9.0
ofOpenTelemetry.Instrumentation.AspNetCore
, with no difference.What we have tested
Set up the OpenTelemetry Collector using these instructions, for easy(er) debugging. Presume the name of the container is
otel-collector
.Run the application (inside Docker). Provoke some HTTP requests that produces tracing-related data. Check the logs of the OpenTelemetry Collector using this command:
docker logs otel-collector 2>&1 | grep data_type
. On the environments where this doesn't work, the command outputs data roughly like this. As can be seen, notraces
-related data is emitted from the application. (logs
was also missing at one point but I think this was because of a misconfiguration in our app)The
OTEL_DIAGNOSTICS.json
-generated diagnosticsWhen it works (on my local machine), the log looks roughly like this. The "Activity stopped" events contains the path to the route being traced (
GET api/path/to/resource-1
etc).When it doesn't work, it looks like this. Note how the
Activity stopped
entries are lacking the URL paths.More details
I've debugged this to the best of my ability, and I suspected that the
if (this.IsEnabled(EventLevel.Verbose, EventKeywords.All))
call here returned false: https://github.com/open-telemetry/opentelemetry-dotnet/blob/5dff99f8a0b26ff75248d5f2e478b9c3c42f5678/src/OpenTelemetry/Internal/OpenTelemetrySdkEventSource.cs#L61-L65Because of limitations in my IDE, I was unable to place a breakpoint in 3rd party code when attaching to the process running inside the Docker container on the machine where we saw the problem, so I couldn't confirm this. Also, now when writing this, I am thinking: if the event source is somehow disabled, would we even get any
Activity started
events being logged at all? :thinking:I am very much at the end of the road here; we don't know how to debug this further. Any ideas/suggestions are greatly appreciated. :pray: