Open lancecarlson opened 1 year ago
@ioquatix were you able to replicate this ?
I have not had a chance to look at it yet but will soon.
@ioquatix I have noticed every growing memory when subscribing to a websocket also. I will investigate this some more.
I really appreciate all the work you have done.
Do you mind telling me how you are measuring memory? RSS?
There was at least one bug in async-http
which was not correctly releasing memory, do you mind trying again and reporting back with the latest releases? v0.60.1 of async-http
and v0.23.0
of async-websocket
.
@ioquatix thanks for looking into this. I'm sorry I didn't get around to investigating this further. I will try the new version and report back.
@zephleggett Any updates?
hey there! we are observing a similar issue, we are on 0.66.3 of async-http
and 0.26.1 of async-websocket
.
Our setup is a bit more involved (we are actually using websockets from within Rails), but the tests show the memory grow indefinitely.
We have only started the investigation today, so not really sure what is the source of the leak, but it does seem to be connected to websockets (regular HTTP requests served by Falcon don't exhibit such behaviour).
We will report our findings as soon as there are any, but if someone has ideas how to narrow down the search, these will be highly appreciated.
@yard Are you handing HTTP/1 or HTTP/2 requests?
Are you able to share a reproduction?
If not, can you please do the following:
At the start of your test server, run the following:
ObjectSpace.trace_object_allocations_start
Then make connections which leak objects. Ensure all the connections are closed/finished, and then run the following:
def dump_all
3.times{GC.start}
File.open("heap.json", "w") do |file|
ObjectSpace.dump_all(output: file)
end
end
Then upload heap.json
somewhere private and share it with me.
Hi @ioquatix, good to see you again :)
Yeah so we are currently trying to decipher heap dumps – we started with the "real" ones from the application, but they are pretty big (leaking around 400kb per ws connection doing a simple 3 message exchange and then going), so we decided to spin up a minimal example and checking whether the leak with repro there.
Our current theory is that a relatively big object graph gets retained in the real app (we use Rails there, so a lot is going on during a single request lifespan), but the cause is probably just a single object that captures an instance of Rack env, a request or something else, essentially magnifying the leak. And of course regular http request do not leak in the app, so all the signs are pointing at web sockets currently.
Re protocols – we are using HTTP/1.1 (can you even do WS with HTTP/2? I am not sure, but not our case anyway).
I will keep you posted as we progress and will obv propose a fix once we find the root cause.
Re protocols – we are using HTTP/1.1 (can you even do WS with HTTP/2? I am not sure, but not our case anyway).
Yes, and it's fully supported.
I will keep you posted as we progress and will obv propose a fix once we find the root cause.
Awesome, sounds great. Yes, holding onto the request (e.g. in a global/thread local cache) will cause problems.. If you dump the heap, and then work backwards from the Async::WebSocket::Connection
object (look at what's referencing it), you should be able to identify the leak.
If that's not helping, try looking at something holding onto the Protocol::HTTP1::Connection
or Protocol::HTTP::Headers
- both can cause ugly leaks if retained accidentally.
Give https://github.com/Shopify/heap-profiler a try too.
Ok so the issue turned out to be almost boring: ActiveRecord query cache internals. What's even more boring is that it's probably won't be an issue with 7.2 due to refactoring of the whole thing.
It took us some time to figure out as all the usual suspects (connection objects, sockets etc) were getting properly released (we even started adding finalizers to them to see GC collecting these). The bug was due to https://github.com/rails/rails/blob/v7.1.3.4/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb#L38 – it simply kept the reference of every fiber ever used with Rails (with the value of true
), which lead to the server leaking quite a significant amount of memory for obvious reasons.
We are still gonna be running more stress tests, but so far it seems that at least the memory usage is totally fine.
Thanks for investigating! @byroot FYI.
I'm running this on the client side:
I'm running the server at the bottom example in the guide here:
https://github.com/socketry/async-websocket/tree/main/guides/getting-started
I'm running 3 different clients and closing them, then running them again. Each time checking the memory of the ruby process spawned from the server. Each time it seems to go up without garbage collecting at all. This could wreak havoc on any long running service. Do you have any idea where the leak is coming from? I tried messing with the $connections global to see if the Set was a problem, but I don't think so. It might be related to some weird string allocation thing going on when the incoming message comes through?