serilog / serilog-sinks-periodicbatching

Infrastructure for Serilog sinks that process events in batches.
Apache License 2.0
71 stars 29 forks source link

Memory leak when sink is idle #76

Closed jessicah closed 5 months ago

jessicah commented 5 months ago

Description Serilog appears to end up with unbounded memory usage, creating a large number of CancellationTokenSources and ValueTasks.

Reproduction Program.cs:

Log.Logger = new LoggerConfiguration()
    .Filter.RateLimited()
    .WriteTo.Seq("http://...", payloadFormatter: new CompactJsonFormatter(), eventBodyLimitBytes: 10485760)
    .CreateLogger();

var builder = WebApplication.CreateBuilder(args);

builder.Host.UseSerilog(Log.Logger);

LoggingRateLimiter.cs:

using Serilog;
using Serilog.Configuration;
using Serilog.Events;

namespace BlazorSSR
{
    public static class LoggingExtensions
    {
        public static LoggerConfiguration RateLimited(this LoggerFilterConfiguration filterConfiguration)
        {
            var limiter = new LoggingRateLimiter();

            return filterConfiguration.ByExcluding(limiter.LimitExceeded);
        }
    }

    class LoggingRateLimiter
    {
        readonly object _sync = new();

        Dictionary<string, DateTime> _lastSuccessTimes = [];

        public bool LimitExceeded(LogEvent logEvent)
        {
            lock (_sync)
            {
                if (logEvent.Properties.TryGetValue("Endpoint", out var path))
                {
                    if (_lastSuccessTimes.TryGetValue(path.ToString(), out DateTime lastSuccess))
                    {
                        if (lastSuccess.AddSeconds(10) < DateTime.Now)
                        {
                            _lastSuccessTimes[path.ToString()] = DateTime.Now;

                            return false;
                        }

                        return true;
                    }
                    else
                    {
                        _lastSuccessTimes[path.ToString()] = DateTime.Now;

                        return false;
                    }
                }

                return false;
            }
        }
    }
}

Expected behavior Memory to remain fairly constant

Relevant package, tooling and runtime versions

Additional context CancellationTokenSource image image

ValueTask image image

nblumhardt commented 5 months ago

Thanks for the heads-up! Where is Filter.RateLimited from? And if you remove it, do you still observe the leak?

(Happy to take a look at it, too, but it's not a part of this codebase as far as I know.)

nblumhardt commented 5 months ago

Ah, sorry, I see it's inline :+1:

nblumhardt commented 5 months ago

Hi again; I've had a dig around, I don't know that there's necessarily a leak here. Do you have any "total" heap stats handy? I.e. are you seeing an infinitely-increasing number of instances of any of the types involved?

The CancellationTokenSource+CallbackNode allocation is a linked list that CancellationTokenSource uses as a pool; it'll keep growing that list up until a point that it can start reusing instances, so if there are a few dozen (even a thousand) it wouldn't necessarily indicate a leak, though whether it's problematic enough to consider it a bug will still depend on how the scenario manifests.

Are you running this in a load test harness, or manually hitting a web endpoint? Running a small load test here (queuing 100M events over ~100s), I see increasing memory up to about 280-300 MB, after which GC appears to keep things constant.

jessicah commented 5 months ago

Just pointing at a specific Blazor endpoint, that is in a steady state, and no new log messages being generated. The "Paths To Root" display on them gets deeper and deeper over time as in the screenshots. This is running it again with the filter commented out. I'd expect in a steady state without new log messages, memory to remain fairly static, but if it's by design, is there a way to limit it? I don't see this with the async console logger, however.

Snapshots for count diff:

Time(s)     CancellationTokenSource+CallbackNode    ValueTask+ValueTaskSourceAsTask<Boolean>
 18.29
 79.85       +16                      +5
359.62      +141                    +149
641.12      +281                    +293
683.57      +306                    +317
754.78      +352                    +352

Managed Memory ordered by count diff:

Object Type             C.Diff  Sz.Diff Inc.Sz Diff Count   Size    Inc.Size
CancellationTokenSource+CallbackNode    106  8480   28640       503 40240   124368
ValueTask+ValueTaskSourceAsTask<Boolean>106  9328   12720       458 40304    54928
List<Object>                106  3392    5808       735 70240   123248
AsyncOperation<Boolean>         106 11024   52384       472 49088   223536
pmsensi commented 5 months ago

Hello,

maybe the problem is related to what is described here: https://github.com/dotnet/runtime/issues/78180

nblumhardt commented 5 months ago

Turned out to be unawaited ChannelReader<T>.WaitToReadAsync() tasks. Thanks for reporting this, @jessicah!

The leak occurs when no data is reaching the sink, and resolves when an event passes through.

I'll move the ticket over to Serilog.Sinks.PeriodicBatching and reference it from the fix PR over there.

nblumhardt commented 5 months ago

New -dev build with this fix should be on NuGet shortly.

nblumhardt commented 5 months ago

Hi @jessicah; Serilog.Sinks.Seq 7.0.1-dev-00288 has just been published and should resolve this issue. It'd be great to hear how it goes for you, if you're able to try it out. Thanks again!