pardahlman / RawRabbit

A modern .NET framework for communication over RabbitMq
MIT License
746 stars 144 forks source link

RawRabbit has solvable memory leaks (ex. PublishAcknowledgeMiddleware) #353

Open houseofcat opened 6 years ago

houseofcat commented 6 years ago

Really any collection reference to IModel is a potential memory leak.

When using transient channels (such as IModel) in Dictionaries or ConcurrentDictionaries as this library does you need to implement a background process that locks or slims the dictionaries to remove keys of dead IModels.

For example:

public static Dictionary<IModel, int> models3 = new Dictionary<IModel, int>();
public static async Task CleanupDictionariesWithSemaphoreAsync(TimeSpan timeSpan)
{
    while (true)
    {
        await Task.Delay(timeSpan);
        await slimShady3.WaitAsync();

        var count = 0;
        var listOfItemsToRemove = models3.Where(x => x.Key.IsClosed).ToArray();
        foreach (var kvp in listOfItemsToRemove)
        {
            models3.Remove(kvp.Key);
            count++;
        }
        await Console.Out.WriteLineAsync($"Dead channels removed: {count}");

        slimShady3.Release();
    }
}

These models are forever inside the Dictionary (used as keys) and are never allowed to be reclaimed by GC even if the IModel is later disposed. This isn't a hot approach as we have discovered several memory leaks regarding this pipeline. We have since implemented a ReplacementMiddleware that has something akin to the above running to clean up ChannelLocks etc.

I would also toss in utilizing a compiled in 4.7.2 and C# 7.2 version or RabbitMQ client 5.x. It performs a heck of lot better than 4.5.1.

I have created a repo demonstration of this memory leak and how I implemented the above code to prevent threads colliding with the dictionaries.

https://github.com/thyams/CookedRabbit

pardahlman commented 6 years ago

Hi @thyams - thanks a lot for you deep analyze! I would be more than happy to review a PR (or perhaps two PRs in this case) with proposed fixes for this.

houseofcat commented 6 years ago

No problem, when I get some time, I will post replacement channel middleware and you can critique it with me.

RedOnePrime commented 6 years ago

@thyams could this be related to not handling events from the Default/Basic event consumers such as Consumer Cancel Notification ?

In that issue I describe where the server has send consumer cancel but RawRabbit is not attaching/handling it and thus the IModel will stay forever. In the case of being a subscriber, it will never get removed either it appears.

houseofcat commented 6 years ago

There was a plethora of issues, even my original solution faulted over time.

Basically what it comes down to, no matter how clever you get with TaskCompletionSource, Channel.IsOpen, PingPong tests, there is no sure fire way of ensuring that your succesfully awaited things like AutoRecovery if connection is IRecoverable. There are too many moving components.

Furthermore, if a Task timesout or is cancelled in the background, you miss it unless you have a global exception handler for UnobserevedTaskExceptions.

Suffice to say, the rabbitmq dotnetclient isn't my favorite library. RawRabbit does a pretty decent job of mitigating most issues, but as seen in my environment Transient network outages often lead to catastrophic service issues.

The real solution is two parts: Search: First Identify Models that are closed, tag them with a timestamp. Destroy: After timespan, assume channel is not coming back alive and remove it from dictionary sources.

That solves the memory leaks. My solution interferes with recovery, so the first step is to search for all IsOpen false Channels and flag them for removal, a second task removes every channel from the dictionary after x timeframe.

I intended to write up a PR to create such a Search & Destroy system but I have been busy with my own RabbitMQ library (https://github.com/thyams/CookedRabbit) and work.