jaegertracing / jaeger-client-csharp

🛑 This library is DEPRECATED!
https://jaegertracing.io/
Apache License 2.0
303 stars 67 forks source link

Obey the X-B3-Sampled accept/deny header without other B3 headers #121

Closed tillig closed 5 years ago

tillig commented 5 years ago

Requirement - what kind of business use case are you trying to solve?

I would like to signal to the Jaeger pipeline as part of an ASP.NET Core application that a particular request should (or should not) be included/reported.

Problem - what in Jaeger blocks you from solving the requirement?

The challenge is a combination of how OpenTracing contrib attempts to ignore a trace and in the way Jaeger handles propagation of trace data via HTTP header, specifically during data extraction.

In OpenTracing contrib, you can exclude certain requests from being traced by specifying lambdas that run on each request. It looks something like this:

services
  .AddJaeger()
  .AddOpenTracing(builder =>
    builder.ConfigureAspNetCore(options =>
    {
      // ignore certain requests to prevent spamming the tracer with irrelevant data
      options.Hosting.IgnorePatterns.Add(request =>
        request.Request.Path.Value?.StartsWith("/monitoring") == true);
    }));

Unfortunately, the way that's wired up doesn't propagate through the whole request In that system, at the beginning of the request the check runs, but later an event may be raised that attempts to create a new trace span (e.g., a database request) and the decision made at the beginning of the request isn't propagated to the event handlers later. This results in random orphaned child spans getting created and reported.

I have both Jaeger and B3 propagation enabled, and if you don't use the exclusion patterns above, then the OpenTracing contrib bits will run the Extract on the current request headers. Again, that's only if you don't use the exclusion patterns above. It took me a while to figure that out.

Jaeger does have good support for the B3 headers like X-B3-Sampled: 0 to disable sampling. However, it appears this only gets used if there are also trace ID, span ID, and parent ID headers. If this is the entry point of the request, like the first request in the door, those headers may not be there, so the sampling opt-out request gets ignored.

Looking at the B3 propagation spec I noticed this:

A deny decision may omit identifiers and look like this:

b3: 0

I also noticed:

While valid to propagate only a reject sampling decision (X-B3-Sampled: 0), if trace identifiers are established, they should be propagated, too.

Point being, it appears the concept of only having the sample decision header is actually valid. Doing that would allow me to simply add the required header to, say, Kubernetes health check requests, and ensure the trace doesn't show up; and it would also bypass the problem in OpenTracing contrib where the exclude pattern decision isn't maintained through the whole pipeline.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Since many codecs may be registered and contribute during extraction, it may be better to use a sort of builder pattern to create the span context during extract rather than assume the entire span context is handled/propagated by any individual codec. This would allow extraction to correctly propagate just the sampling decision from the B3 codec whilst not requiring additional headers be present.

tillig commented 5 years ago

A sort of quick workaround may be to create a sort of sampler decorator that looks at the current request headers and makes a sampling decision based on the header value or the selected sampling strategy. I'll see if I can make something like that work in the meantime.

yurishkuro commented 5 years ago

what kind of semantics for multiple codecs would you propose? E.g., they can work in priority order, if the first one parsed the trace context, the others will either not try to parse it, or won't be allowed to affect the one already stored in the builder. But what about trace context vs. baggage, e.g. if you have trace context only in the Jaeger format, and trace context + baggage in Zipkin format?

tillig commented 5 years ago

That's a good question. I can't say I'm extremely versed in all the ins and outs of distributed tracing, but in other situations I find that registration order matters. So if you have two or more things contributing to the system, it implies...

From what I can tell, that would have implications on things like environment variable configuration - if you registered as jaeger,b3 it would behave differently than if you registered as b3,jaeger due to order.

I think that's a fairly reasonable way to go for reading in the data from an inbound trace request. For outbound propagation... that's another interesting question. Is it valid to propagate trace information in two different ways? Like if B3 and W3C trace headers were both used? Let's say it is - for compatibility across different instrumented service types, let's say you can do that.

In that case, I'd say outbound propagation should be consistent. If inbound values conflict as noted above, any outbound data propagated should match whatever the final/resolved set of data is. Ostensibly there should also be some warning logged if you have the logging enabled so the operators of the system can see that there's something fishy going on; possibly even some sort of additional tag or baggage or whatever added so it's visible in the end trace view.

What do you think of that?

The one thing I don't know how you'd handle is the way a span is determined to be a child of some other span or is the root. There seems to be a lot of reliance on assuming that a trace context is either null or it's a fully populated thing.

For example, in the OpenTracing contrib stuff, whatever comes out of the extraction gets passed to SpanBuilder.AsChildOf. It makes sense that a "root span" isn't a child of anything, but if a span is a child of something, it needs to be a complete context. If you consider that the X-B3-Sampled header doesn't actually represent a whole context, that would mean the new span shouldn't be a child of the extracted span but instead would simply get a flag set to indicate sampling is turned off.

I don't have enough experience at the moment to recommend a solution to that. Something that comes to mind is that maybe AsChildOf needs to check for, say, a null trace ID and only set flags on the current context. Or maybe that's more the reporter's job, to trim off "placeholder parents" before sending the spans in. This is where it gets fuzzy for me - I don't know how to simply propagate a partial context.

yurishkuro commented 5 years ago

I'm on the same page about the order of codecs deciding the priority of which representation wins (except that it should be "first wins" in extract(), to improve performance and stop further evals). It's fairly easy to implement as well.

I don't consider incomplete trace context a valid use case, it would simply be discarded, so parent/child question is a wash to me.

But what I don't have a strong opinion is on how to handle baggage. The simplest solution is to handle it atomically with the trace context - whichever codec wins pulls the baggage as well. The only use case where it gets tricky is in supporting the special header jaeger-baggage that can be used for debugging purposes. In other words, if you configure the codecs as b3,jaeger and then send a debug request with jaeger-baggage header, it own't be respected. Maybe it's ok.

tillig commented 5 years ago

TBH, I don't have a strong opinion on how the baggage is handled because I don't know enough about it. My primary concern was just handling the inbound header and being able to obey it; I totally defer to you on the baggage aspect.

yurishkuro commented 5 years ago

I would like to signal to the Jaeger pipeline as part of an ASP.NET Core application that a particular request should (or should not) be included/reported.

Did you try using sampling.priority tag? It is intended to allow the application to override the decision of the sampler, and it is sticky, i.e. propagates downstream. Primarily it is used with value 1 to force the sampling (also raises the debug flag), but value 0 can be used to disable sampling even if probabilistic sampler turned it on.

tillig commented 5 years ago

I didn't try that, but I'm not sure where it should be set. Is that something that should be set when OpenTracing begins the request span? Is there a sort of plugin, like an additional codec I should add? I apologize if this sounds like a newbie question.

yurishkuro commented 5 years ago

it's a tag that can be set at any point on the span: Tags.SamplingPriority.set(span, 1) forces sampling of that span and all its future children.

https://github.com/jaegertracing/jaeger-client-csharp/blob/ae54ab06b0d5ff71eac2155622e99fa89f287d73/src/Jaeger/Span.cs#L153-L171

tillig commented 5 years ago

Since the application basically executes like this...

... and so on.

Given that, it appears that in order to obey the header, I really do need to catch that very first span that OpenTracing opens - first thing in the request pipeline - and modify that span. Otherwise, I'll still end up with something orphaned and reported.

I'll see if I can get that working. I'm hoping I don't run into any weird race conditions where the ASP.NET Core framework causes a child span to be created before I can get to the parent span to turn off sampling.

In the meantime, my decorator approach does work. The decorator looks like this:

    public class SamplerHeaderPropagationDecorator : ISampler
    {
        public const string Type = "header";

        private readonly IHttpContextAccessor _httpContextAccessor;

        private readonly ISampler _sampler;

        private readonly IReadOnlyDictionary<string, object> _tags;

        public SamplerHeaderPropagationDecorator(ISampler sampler, IHttpContextAccessor httpContextAccessor)
        {
            this._sampler = sampler ?? throw new ArgumentNullException(nameof(sampler));
            this._httpContextAccessor = httpContextAccessor ?? throw new ArgumentNullException(nameof(httpContextAccessor));
            this._tags = new Dictionary<string, object>
            {
                { Constants.SamplerTypeTagKey, Type },
                { Constants.SamplerParamTagKey, B3TextMapCodec.SampledName },
            };
        }

        public void Close()
        {
            this._sampler.Close();
        }

        public SamplingStatus Sample(string operation, TraceId id)
        {
            // If the exclude header is present, short circuit any other checks.
            var headers = this._httpContextAccessor.HttpContext?.Request?.Headers;
            if (headers != null &&
                headers.TryGetValue(B3TextMapCodec.SampledName, out var headerValue))
            {
                var headerStringValue = headerValue.ToString();
                if (headerStringValue.Equals("0", StringComparison.OrdinalIgnoreCase))
                {
                    return new SamplingStatus(false, this._tags);
                }
                else if (headerStringValue.Equals("1", StringComparison.OrdinalIgnoreCase))
                {
                    return new SamplingStatus(true, this._tags);
                }
            }

            // No previous decision was present, so let the in-app sampler decide.
            return this._sampler.Sample(operation, id);
        }
    }

It's a little hairy looking trying to add it, though, due to #124 .

services.AddSingleton<ITracer>(serviceProvider =>
{
    var accessor = serviceProvider.GetRequiredService<IHttpContextAccessor>();
    var logFactory = serviceProvider.GetRequiredService<ILoggerFactory>();
    var config = Jaeger.Configuration.FromEnv(logFactory);
    var tracerBuilder = config.GetTracerBuilder();

    // The hairy bit - need to use reflection to poke the private field since the builder
    // doesn't expose any way to get the configured sampler.
    var field = typeof(Jaeger.Tracer.Builder).GetField("_sampler", BindingFlags.Instance | BindingFlags.NonPublic);
    var originalSampler = (ISampler)field.GetValue(tracerBuilder);
    var decoratedSampler = new SamplerHeaderPropagationDecorator(originalSampler, accessor);
    field.SetValue(tracerBuilder, decoratedSampler);
    var tracer = tracerBuilder.Build();
    GlobalTracer.Register(tracer);
    return tracer;
})

I'll see if I can come up with a middleware solution that uses the tag you mentioned.

tillig commented 5 years ago

Verified middleware does work as long as you ensure it's the very first thing in the pipeline.

    public class TraceSamplerHeaderMiddleware
    {
        private readonly RequestDelegate _next;

        public TraceSamplerHeaderMiddleware(RequestDelegate next)
        {
            this._next = next ?? throw new ArgumentNullException(nameof(next));
        }

        public async Task Invoke(HttpContext context, ITracer tracer)
        {
            var span = tracer.ActiveSpan;
            var headers = context?.Request?.Headers;
            if (span != null &&
                headers != null &&
                headers.TryGetValue(B3TextMapCodec.SampledName, out var headerValue))
            {
                var headerStringValue = headerValue.ToString();
                if (headerStringValue.Equals("0", StringComparison.OrdinalIgnoreCase))
                {
                    span.SetTag(Tags.SamplingPriority.Key, 0);
                }
                else if (headerStringValue.Equals("1", StringComparison.OrdinalIgnoreCase))
                {
                    span.SetTag(Tags.SamplingPriority.Key, 1);
                }
            }

            await this._next(context);
        }
    }

Where would something like that belong? In the OpenTracing project? Here? I'd be happy to PR it to the right spot if that's the prescribed solution.

Falco20019 commented 5 years ago

https://github.com/opentracing-contrib/csharp-netcore Should be the correct place since it‘s ASP.NET related.