serilog / serilog-aspnetcore

Serilog integration for ASP.NET Core
Apache License 2.0
1.31k stars 207 forks source link

Am I implementing custom logging properties correctly? #59

Closed markashton closed 5 years ago

markashton commented 6 years ago

We had a solution that worked just fine in aspnet core 2.0, but have recently upgraded to v2.1 and are now finding that our method for logging out a request correlation id is not behaving the way it used to. In a nutshell, we have: (1) A log template that looks like this: "[{Timestamp:HH:mm:ss.fff} {Level}] {Message:lj} [{SourceContext}]{CorrelationId}{NewLine}{Exception}" (2) A middleware that pushes into the log context a value for 'CorrelationId' that comes from a request header, like this:

var requestIdHeaderValue = context.Request.Headers.FirstOrDefault(x => x.Key == "Request-Id").Value;
if (!requestIdHeaderValue.Any()) await _next.Invoke(context);
using (LogContext.PushProperty("CorrelationId", requestIdHeaderValue))
{
    await _next.Invoke(context);
}

(3) And then another piece of middleware using a diagnostic listener to create that Request-Id header if one isn't present.

This all worked great across a multi-service/tier application and our log entries had a really useful correlation id. Now though I am seeing it is just logging the word 'null' where before it had a string.

Debugging everything looks fine - I can see correct values being pushed to the context immediately before a logging line is acted on - and in one of the controllers I also did a LogContext.Clone() immediately after a logging line just to inspect the LogContext. The enricher produced by cloning shows that there is a PropertyEnricher, with name of 'CorrelationId' and the correct value and yet the logging line immediately before that did not write that property out.

e.g. _logger.LogDebug("Searching: {searchTerm} Page:{page} PerPage:{perPage}", searchTerm, page, perPage); var t = LogContext.Clone();

produced the log entry of: [11:38:30.733 Debug] Searching: null Page:1 PerPage:10 [Controllers.Account.SearchController]null

... the last 'null' should be the correlation id.

This is just a sanity check, am I doing something wrong here?

tsimbalar commented 6 years ago

isn't there a else missing here after the if ?

if (!requestIdHeaderValue.Any()) await _next.Invoke(context);
using (LogContext.PushProperty("CorrelationId", requestIdHeaderValue))

(also shouldn't it return the result of _next.Invoke())

markashton commented 6 years ago

(That's what I get for tweaking things to inspect them!) Sorry I broke the value of requestIdHeaderValue out into its own variable purely for inspection and catching that it might be null - but it never was null so putting it back to:

        public async Task Invoke(HttpContext context)
        {
            using (LogContext.PushProperty("CorrelationId", context.Request.Headers.FirstOrDefault(x=>x.Key=="Request-Id").Value))
            {
                await _next.Invoke(context);
            }
        }

has no material change. And no, you don't need a return for the last await _next.Invoke(context)

skomis-mm commented 6 years ago

@markashton in 2.1 CorrelationId is already included in the logging scope, so (2) middleware is not needed. But at the same time if Request-Id is not present in the headers, the (3) middleware won't affect scope because CorrelationId was taken earlier in the pipeline regardless of the overwritten Request-Id header value. Things probably will change in 2.2 , see discussions.

So its probably better for now just to use RequestId for correlating requests because it is implemented as hierarchical Id or use some other name.

markashton commented 6 years ago

@skomis-mm Thanks for the information, and links - very useful! This totally explains why my existing use of the Request-Id header and CorrelationId property are now broken in 2.1, and also why in my tweaks at one point I was seeing a null just for the first request in the service stack, but an id for further calls (when using the 2.1 NuGet packages but still targeting 2.0 framework). This is a problem I had solved myself, but it seems the aspnet core team didn't solve this, but have left their partial solution in a state that doesn't actually achieve what it should. It also explains why whatever I do with the middleware is not fixing it, as they are making changes to things in the pipeline that I have no control over. I'm also noticing in 2.1 that a Request-Id header is not automatically propagated through all requests, whereas previously it was (for example, when an API makes a back-channel token introspection request to an IdentityServer). I think my only option now is to just abandon my work and wait for 2.2, where it looks like they're going to reverse out of this and make it work like it did in 2.0.

What do you mean about using RequestId? I've disabled my middleware and just changed CorrelationId to RequestId in my logging template, but what I now see is an id such as 0HLFP4T8E9BAB:00000001 ... but this isn't correlated across service boundaries (i.e each service is showing its own RequestId) and also isn't hierarchical. TBH I don't really care about the hierarchical nature, it was nice but the most important thing is just having one id to link logs across service boundaries. Thanks again, you've shed light on something I've spent way too long trying to get to the bottom of.

skomis-mm commented 6 years ago

@markashton nevermind, sorry. I re-checked and found it doesn't work as I expected with RequestId as you pointed.

Neutrino-Sunset commented 6 years ago

I have the same problem as @markashton but I don't understand the discussion as well as he does, or where to go from here.

I don't understand the AspNetCore.Hosting code linked by @skomis-mm. The use of DiagnosticListener suggests this is something to do with the DiagnosticSource logging infrastructure, but my Serilog logging is ILogging based. Without any additional knowledge or context I don't know what I'm supposed to learn from this code.

As @markashton points out using Request-Id on its own is not useful as it isn't hierarchical and is not propogated across service boundaries.

The linked discussion for asp/Hosting issue 1384 states they aren't planning to address this until version 3.0 at the earliest.

So what is the solution in the meantime? I've linked in a full fat logging support (Serilog and CorrelationId) but still have critical required functionality missing. I don't mean to suggest that this is Serilog's fault, but it is a critical logging issue and some guidance would be extremely welcome. For the system we are working on having nothing in place to correlate logs until this is fixed just isn't an option.

skomis-mm commented 6 years ago

@Neutrino-Sunset , 1) the code linked shows that since 2.1 Request-Id http header is mapped to CorrelationId logger scope property (not to be confused with RequestId scope property (without -), which is local generated id). So, without any additional middleware if you pass Request-Id header than you get CorrelationId automatically in the logging scope.

2) about not working distributed tracing with Request-Id I sorted this out a bit. The diagnostics is off by default so you need to enable it. This is some sort of a basic setup (you also need to install System.Reactive package to use extension methods with delegates):

// somewhere in the startup code
var subscription = DiagnosticListener.AllListeners.Subscribe(listener =>
{
    var observer = new AnonymousObserver<KeyValuePair<string, object>>(_ => { });

    if (listener.Name == "Microsoft.AspNetCore")
    {
        // enable diagnostics on incoming requests
        listener.Subscribe(observer, source => source.StartsWith("Microsoft.AspNetCore.Hosting.HttpRequestIn"));
    }

    if (listener.Name == "HttpHandlerDiagnosticListener")
    {
        // enable diagnostics on outgoing HttpClient requests
        listener.Subscribe(observer, source => source.StartsWith("System.Net.Http.HttpRequestOut"));

    }
});

And if you have api like this:

// GET api/values
[HttpGet]
public async Task<ActionResult<IEnumerable<string>>> Get()
{
    // 1) pass a `Request-Id` header ("123", for example) to this method
    // 2) make an outgoing request to external api
    // 3) diagnostic middleware will start new Activity and will
    // set outgoing `Request-Id` header to something like |123.14c3dab3_1. 
    // (using ParentId with incoming `Request-Id` value)
    return new[] { await new HttpClient().GetStringAsync("http://localhost:5000/api/values/5") };
}

// GET api/values/5
[HttpGet("{id}")]
public ActionResult<string> Get(int id)
{
    // now you will see "|123.14c3dab3_1." in the CorrelationId scope property
    _logger.LogInformation("Hello, world");
    return "value_" + id;
}

you'll get (I believe this how it works with AI integration) all without manually handling and passing correlation ids to external systems.

3) now regarding of null values when you want to pass correlation id in the middleware. I didn't dig much but it has something with mixing LogContext and ILogger's scopes (they are implemented as separate ambient contexts AFAIK). If you use ILogger' scopes then it will work as expected:

// in the middleware
var logger = context.RequestServices.GetRequiredService<ILogger<MyMiddleware>>();
using (logger.BeginScope(new Dictionary<string, object> { ["CorrelationId"] = 42 }))
{
    await next.Invoke();
}
Neutrino-Sunset commented 6 years ago

Thanks @skomis-mm that's a lot fo work you've gone to :)

I put a hand rolled implementation in our codebase over the weekend in which I use an enricher to check whether the LogContext CorrelationId is null and if it is replace it with either the value of an X-Correlation-ID header on the request (if it exists) or with the current context TraceIdentifier. Then that same value is used as the value of the X-Correlation-ID header on any outgoing requests.

It's a bit hacky as getting an IHttpContextAccessor into an enricher is not straightforward, especially if you need to initialise the logging system before DI is in place (I'm just setting the ApplicationService provider as a static property on the enricher once DI is up and running, which is at least guaranteeed to be before any incoming requests are handled).

We'll have to go with this approach for this sprint, but once that's out the door (in a few days) I'll work through the solution you've listed above to see whether that will work better for us.