pardahlman / RawRabbit

A modern .NET framework for communication over RabbitMq
MIT License
746 stars 144 forks source link

Memory leak in Command.ConsolidateBody() in RabbitMQ via RawRabbit #313

Closed arrkaye closed 6 years ago

arrkaye commented 6 years ago

Hi,

We're seeing a memory leak when using RawRabbit in the RabbitMQ client.

The situation occurs when using the Request/RespondAsync extensions, specifically when messages are on the Large Object Heap.

I have built a sample solution based off the Sample solution but have changed the ValueRequest class to include a random large string as part of the message. As more requests occur, the more Byte[] seem to be held and not released during a GC.

https://github.com/arrkaye/rawrabbit-rabbitmq-memory-leak-example

I also have a dotMemory export which shows snapshots at three points - initial, after 100 requests and after 200 requests. Screenshot of the Call Tree and export.

image

https://drive.google.com/file/d/1JB-hQ9FUoEL-NtVsNFor6UGpVNdvE-Bj/view?usp=sharing

Any help would be appreciated. I appreciate this occurs with the RabbitMQ client, but I raised here as its via RawRabbit and want to rule out any misusages first.

Thanks.

Arrkaye

michaelklishin commented 6 years ago

Those body[] objects are message payloads that we delivered to a consumer. If the consumer does not process them, they will be kept around. If there is no limit on how many deliveries a consumer can be given without acknowledging any or the automatic acknowledgement mode is used, the number of outstanding unprocessed deliveries will keep growing. This is to say that this is not necessarily an indication of a memory leak in either client: it can be in your app, which does not use manual acknowledgements and a bound prefetch yet also doesn't consume fast enough.

arrkaye commented 6 years ago

Hi,

Updates on the issue raised in the rabbitmq-dotnet-client suggest this issue is with the way RawRabbit is using it.

The example code I posted was the Sample code from RawRabbit with minor tweaks to increase the message payload.

Any idea what could be causing this issue?

https://github.com/rabbitmq/rabbitmq-dotnet-client/issues/392

arrkaye commented 6 years ago

I've just replaced RawRabbit with EasyNetQ in my example project and the memory leak appears to have vanished. Almost definitely confirming that it is indeed RawRabbit with the issue.

I'm going to switch over to ENQ until there's a reason to switch back.

image

image

pardahlman commented 6 years ago

Hey @arrkaye - thanks for reporting this. Out of curiosity, which version of EasyNetQ are you using, and what version of the underlying client does it depend on? And have you configured it to use autoAck? RawRabbit has publisher confirms and does not use autoAck, so it would be interesting to see if the memory leek is still there with those settings

arrkaye commented 6 years ago

Hi,

I'm using EasyNetQ 2.4.0-alpha, underlying client is 5.0.1.

I didn't set any settings on EasyNetQ explicitly, so it's using the default for everything.

I've updated https://github.com/arrkaye/rawrabbit-rabbitmq-memory-leak-example with the change to ENQ to demonstrate.

pardahlman commented 6 years ago

Looking at your sample app, this is something that you see when performing RPC calls (request/response)?

arrkaye commented 6 years ago

Yes, we only use the RPC functionality.

I think your theory around the noAck makes sense - as if RR doesn't AutoAck, there isn't a place to Ack these messages from within the RespondAsync method?

pardahlman commented 6 years ago

Perhaps. I'm looking into this now.... RawRabbit uses direct-replyto for faster RPCs so perhaps it is something there. Not sure yet. Are your graphs from the AspNet-project?

arrkaye commented 6 years ago

They were from the RawRabbit.[AspNet/ConsoleApp].Sample projects, but I had stripped out all AspNet references and made them into basic console applications - I've changed them to use ENQ but the RawRabbit refs are still there.

pardahlman commented 6 years ago

Perfect! I think that the source of the issue is that the response is received but not acked. Haven't had time to look closely at the code just yet.

Thanks for reporting this, if the problem is what I think it is, it will be part of the next rc.

arrkaye commented 6 years ago

Excellent. What kind of time scale are you thinking for the RC?

pardahlman commented 6 years ago

Not sure ATM, will look into it this weekend, but this spring has been hectic so can't make any promises.

pardahlman commented 6 years ago

I have found and fixed an issue that might be the source of what you are experiencing; the RequestAsync pipeline has a middleware that hold the list of all ongoing requests and completes them as the responses are received. This middleware was previously instantiated as a singleton, but is now transient. This change made RawRabbit register an eventhandler multiple times for the same Received event. I've commited the change, but I don't have dotMemory installed, so I wont be able to confirm that this fixes the issue.

Do you think it would be possible for you to run your samples again the updated source code and see if there is any changes in memory allocation?

arrkaye commented 6 years ago

Yes, happy to help. How do I get it?

pardahlman commented 6 years ago

Super! I think the easiest way for you would be to clone this repo - the latest changes are already in the branch 2.0!

pardahlman commented 6 years ago

Heya - a new release (RC4) has been released. It'd be interesting to hear if you see similar memory allocations in that version!

arrkaye commented 6 years ago

@pardahlman Thanks. I've given it a try but it appears to have had no impact, the memory leak is still there. Here's the dotMemory, in the same place. I've updated my Example repo.

image

pardahlman commented 6 years ago

Hello again, @arrkaye just wanted you to know that I found a significant memory leak in the Request operation. After the fix my snapshots looks like this

image

It would be interesting to see if the fix also solves your scenario

arrkaye commented 6 years ago

@pardahlman That looks very promising. Has the change been built into a release for me to test out?

pardahlman commented 6 years ago

i was meant to release it yesterday, but I wanted to include fixes for #303... so not yet - however, you should be able to run it from source code. If not, I'll write you a line once this has been packaged and published to nuget.org

arrkaye commented 6 years ago

Nice one. I've given it a try and it looks like it solves the issue. Thanks very much for addressing it. I look forward to obtaining the packages. Please feel free to close this issue once you are happy.

image

pardahlman commented 6 years ago

Closing this now, will give you a ping when the fix is published!

arrkaye commented 6 years ago

Hi @pardahlman Has this been incorporated into RC5?

pardahlman commented 6 years ago

Yes it has! Sorry for not notifying you.