serilog-contrib / serilog-enrichers-clientinfo

Enrich logs with client IP, correlation id and HTTP request headers.
MIT License
93 stars 20 forks source link

Enrichers are not thread-safe #14

Open basvd opened 2 years ago

basvd commented 2 years ago

The enrichers provided by ClientInfo are not thread-safe. This is a problem when concurrent tasks are trying to log inside the scope of a request. For example in the following situation:

var httpClient = _httpClientFactory.CreateClient(HttpClientName);

var task1 = Task.Run(async () =>
{
    var response = await httpClient.GetAsync("/api/call1");
    return await response.Content.ReadAsStringAsync();
});

var task2 = Task.Run(async () =>
{
    var response = await httpClient.GetAsync($"/api/call2");
    return await response.Content.ReadAsStringAsync();
});

await Task.WhenAll(task1, task2);

Because the HttpClients are logging they both use a logger that is trying to enrich the event with some client info (e.g. using ClientIpEnricher). If the property (e.g. Serilog_ClientIp) is not yet available on the HttpContext.Items then both instances will try to write it concurrently, which may cause various exceptions to be thrown depending on when and how the conflicting access occurs.

In theory this problem should only occur if the property isn't already on the HttpContext.Items during the concurrent phase. In other words, if the first logs of the request are written in a concurrent situation then it might occur. This means a workaround fix would be to write a log message before going concurrent (e.g. with _logger.Information("workaround fix") before the tasks in the snippet above). In my own tests this indeed seems to work.

I think there is an (implicit) expectation that Serilog enrichers are thread-safe (as mentioned here: https://github.com/serilog/serilog/issues/1144) so that's why I report this as a bug.

mo-esmp commented 2 years ago

I'm not sure if it going to happen because HttpContext in ASP.NET Core is not a singleton and for each request, a new instance of HttpContext will be populated and the value of httpContext.Items[IpAddressItemKey] should be the same for all requests (after first initialization).

basvd commented 2 years ago

It happens inside the same request context. The code I posted should be understood as being inside of an ASP.NET action (e.g. async Task<IActionResult> Get(). It's the two concurrent GetAsyncs of the HttpClient that cause the logger to be concurrently accessed and, through the enricher, the HttpContext.Items is concurrently accessed as well (because both these calls occur in the same request context). The problem occurs because this items dictionary is not thread safe.

basvd commented 2 years ago

It seems the main purpose of using HttpContext.Items is to cache the LogEventProperty. I think it might be worth looking into MemoryCache for this instead, because it serves the same purpose and also provides thread safety. As a cache key you could use the request ID plus the item key.

mo-esmp commented 2 years ago

It's the two concurrent GetAsyncs of the HttpClient that cause the logger to be concurrently accessed and, through the enricher, the HttpContext.Items is concurrently accessed as well (because both these calls occur in the same request context). The problem occurs because this items dictionary is not thread safe.

I cannot understand what's the relation between HttpClient and HttpContext ? multiple concurrent requests do not lead to having a sharing HttpContext between all requests and each request has its own HttpContext. I think @Kahbazi can explain it better hence this functionality added by his PR.

basvd commented 2 years ago

Maybe the problem is easier to understand if I simplify the example a bit:

public async Task<IActionResult> Get()
{
   // Here _logger is a logger enriched with the ClientIpEnricher/ClientAgentEnricher

    var task1 = Task.Run(() =>
    {
        _logger.Information("Log message one");
    });

    var task2 = Task.Run(() =>
    {
        _logger.Information("Log message two");
    });

    await Task.WhenAll(task1, task2);
    return Ok();
}

The above code creates a situation where logger is being called in a multi-threaded way which in the ClientIpEnricher/ClientAgentEnricher can result in a race condition on the HttpContext, where the same item is added more than once. Both tasks share the same HttpContext because they are being performed as part of the same request to the ASP.NET application (i.e. the enclosing Get method).

HttpClient is just what triggered this behavior in my code, because it also uses the enriched logger. But I realize it makes it less obvious what is going on.

I've noticed at least two kinds of exceptions that can be thrown:

Exception System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
   at System.Collections.Generic.Dictionary`2.FindValue(TKey key)
   at System.Collections.Generic.Dictionary`2.TryGetValue(TKey key, TValue& value)
   at Microsoft.AspNetCore.Http.ItemsDictionary.System.Collections.Generic.IDictionary<System.Object,System.Object>.get_Item(Object key)
   at Serilog.Enrichers.ClientAgentEnricher.Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory)
   at Serilog.Core.Enrichers.SafeAggregateEnricher.Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory) caught while enriching Serilog.Events.LogEvent with Serilog.Enrichers.ClientAgentEnricher.

And:

Exception System.ArgumentException: An item with the same key has already been added. Key: Serilog_ClientIp
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
   at Serilog.Enrichers.ClientIpEnricher.Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory)
   at Serilog.Core.Enrichers.SafeAggregateEnricher.Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory) caught while enriching Serilog.Events.LogEvent with Serilog.Enrichers.ClientIpEnricher.
Kahbazi commented 2 years ago

@basvd You are correct. I'm afraid theoretically this is a bug, when the first log of the request happens in some concurrent tasks. But I'm wondering if you have ever get this bug in a real-case scenario? Have you seen this bug in production?

The workaround that you have suggested (to have a log before concurrent tasks) would almost always happens since the ASP.Net Core itself have some logs. Unless they are disabled which I guess are rarely the case!

@mo-esmp To fix this bug I guess we could put a lock on httpContext in the Enrich method. But I don't think it would worth the effort and I'm not sure how much it would hurt the performance. I believe the worst case scenario is that the first log of the concurrent tasks won't have the client info :thinking: and Again this would only happen if there is no log before the concurrent tasks which almost never happen because the logs of the framework itself.

It seems the main purpose of using HttpContext.Items is to cache the LogEventProperty. I think it might be worth looking into MemoryCache for this instead, because it serves the same purpose and also provides thread safety. As a cache key you could use the request ID plus the item key.

Yes, the purpose of the PR was to reduce the LogEventProperty allocations, but right now I think storing it anywhere beside HttpContext bring more complexity. The HttpContext.Items would be cleared after the request is finished, but anything beside that needs to be time-dependent which I don't feel good about it. I suggest to leave this as it is.

basvd commented 2 years ago

@Kahbazi I have indeed seen this bug in real application code. The example that I describe in my original post is based on that scenario. Apparently no log is written before the controller action is called and so the running tasks will be the first to write a log, but I do agree that it is unlikely (me and my team have built an entire application with this enricher without ever running into the issue).

I understand the possible solutions are not ideal in terms of complexity. The workaround/solution in application code is quite simple: Just log something before doing anything multi-threaded. And I'm fine with that.

If I can think of a non-complex alternative to using the HttpContext I will consider making a pull request.

mo-esmp commented 2 years ago

Changing code at line 53 would fix the problem, however, any suggestion to do it in a more performant way?

lock(locker)
{
    if (httpContext.Items[IpAddressItemKey] is LogEventProperty logEventProperty)
        return;

    var ipAddressProperty = new LogEventProperty(IpAddressPropertyName, new ScalarValue(ipAddress));
    httpContext.Items.Add(IpAddressItemKey, ipAddressProperty);
    logEvent.AddPropertyIfAbsent(ipAddressProperty);
}
basvd commented 2 years ago

I just realized that another solution (for projects using Serilog.AspNetCore) is to use its EnrichtDiagnosticContext to add the ClientIp and ClientAgent event properties. This is an alternative to using the ClientInfo enrichers entirely. The properties will be created by a middleware (which avoids the multi-thread problem) and set once on the diagnostic context, which effectively caches them.

Just posting this here for anyone who runs into the issue looking for solutions:

app.UseSerilogRequestLogging(options =>
{
    options.EnrichDiagnosticContext = (diagnosticContext, httpContext) =>
    {
        diagnosticContext.Set("ClientIp", httpContext.Connection.RemoteIpAddress);
        diagnosticContext.Set("ClientAgent", httpContext.Request.Headers.UserAgent);
    };
});

Note that if you use ForwardedHeadersMiddleware that you register this before the request logging so that the RemoteIpAddress can be set properly.

Edit: This will only enrich the request logs, so it does work differently from ClientInfo, which enriches all the logs during the request.

mo-esmp commented 2 years ago

I think with the above code then installing this package is pointless.

pergardebrink commented 8 months ago

Couldn't this be solved with static AsyncLocal storage instead of storing it on the HttpContext.Items?

HttpContextAccessor uses AsyncLocal internally, so they would follow each other either way.