socketry / async-websocket

Asynchronous WebSocket client and server, supporting HTTP/1 and HTTP/2 for Ruby.
MIT License
166 stars 18 forks source link

Potential memory leak? #43

Open lancecarlson opened 1 year ago

lancecarlson commented 1 year ago

I'm running this on the client side:

require 'async'
require 'async/http/endpoint'
require 'async/websocket/client'

URL = ARGV.pop || "http://127.0.0.1:7070"

Async do |task|
  endpoint = Async::HTTP::Endpoint.parse(URL)

  Async::WebSocket::Client.connect(endpoint) do |connection|
    1000.times do
      connection.send_text("Hello World")
    end
    connection.flush

    while message = connection.read
      p message
    end
  end
end

I'm running the server at the bottom example in the guide here:

https://github.com/socketry/async-websocket/tree/main/guides/getting-started

I'm running 3 different clients and closing them, then running them again. Each time checking the memory of the ruby process spawned from the server. Each time it seems to go up without garbage collecting at all. This could wreak havoc on any long running service. Do you have any idea where the leak is coming from? I tried messing with the $connections global to see if the Set was a problem, but I don't think so. It might be related to some weird string allocation thing going on when the incoming message comes through?

lancecarlson commented 1 year ago

@ioquatix were you able to replicate this ?

ioquatix commented 1 year ago

I have not had a chance to look at it yet but will soon.

zephleggett commented 1 year ago

@ioquatix I have noticed every growing memory when subscribing to a websocket also. I will investigate this some more.

I really appreciate all the work you have done.

ioquatix commented 1 year ago

Do you mind telling me how you are measuring memory? RSS?

ioquatix commented 1 year ago

There was at least one bug in async-http which was not correctly releasing memory, do you mind trying again and reporting back with the latest releases? v0.60.1 of async-http and v0.23.0 of async-websocket.

zephleggett commented 1 year ago

@ioquatix thanks for looking into this. I'm sorry I didn't get around to investigating this further. I will try the new version and report back.

kaspergrubbe commented 10 months ago

@zephleggett Any updates?

yard commented 1 month ago

hey there! we are observing a similar issue, we are on 0.66.3 of async-http and 0.26.1 of async-websocket. Our setup is a bit more involved (we are actually using websockets from within Rails), but the tests show the memory grow indefinitely.

We have only started the investigation today, so not really sure what is the source of the leak, but it does seem to be connected to websockets (regular HTTP requests served by Falcon don't exhibit such behaviour).

We will report our findings as soon as there are any, but if someone has ideas how to narrow down the search, these will be highly appreciated.

ioquatix commented 1 month ago

@yard Are you handing HTTP/1 or HTTP/2 requests?

ioquatix commented 1 month ago

Are you able to share a reproduction?

If not, can you please do the following:

At the start of your test server, run the following:

ObjectSpace.trace_object_allocations_start

Then make connections which leak objects. Ensure all the connections are closed/finished, and then run the following:

def dump_all
  3.times{GC.start}

  File.open("heap.json", "w") do |file|
    ObjectSpace.dump_all(output: file)
  end
end

Then upload heap.json somewhere private and share it with me.

yard commented 1 month ago

Hi @ioquatix, good to see you again :)

Yeah so we are currently trying to decipher heap dumps – we started with the "real" ones from the application, but they are pretty big (leaking around 400kb per ws connection doing a simple 3 message exchange and then going), so we decided to spin up a minimal example and checking whether the leak with repro there.

Our current theory is that a relatively big object graph gets retained in the real app (we use Rails there, so a lot is going on during a single request lifespan), but the cause is probably just a single object that captures an instance of Rack env, a request or something else, essentially magnifying the leak. And of course regular http request do not leak in the app, so all the signs are pointing at web sockets currently.

Re protocols – we are using HTTP/1.1 (can you even do WS with HTTP/2? I am not sure, but not our case anyway).

I will keep you posted as we progress and will obv propose a fix once we find the root cause.

ioquatix commented 1 month ago

Re protocols – we are using HTTP/1.1 (can you even do WS with HTTP/2? I am not sure, but not our case anyway).

Yes, and it's fully supported.

I will keep you posted as we progress and will obv propose a fix once we find the root cause.

Awesome, sounds great. Yes, holding onto the request (e.g. in a global/thread local cache) will cause problems.. If you dump the heap, and then work backwards from the Async::WebSocket::Connection object (look at what's referencing it), you should be able to identify the leak.

If that's not helping, try looking at something holding onto the Protocol::HTTP1::Connection or Protocol::HTTP::Headers - both can cause ugly leaks if retained accidentally.

Give https://github.com/Shopify/heap-profiler a try too.

yard commented 1 month ago

Ok so the issue turned out to be almost boring: ActiveRecord query cache internals. What's even more boring is that it's probably won't be an issue with 7.2 due to refactoring of the whole thing.

It took us some time to figure out as all the usual suspects (connection objects, sockets etc) were getting properly released (we even started adding finalizers to them to see GC collecting these). The bug was due to https://github.com/rails/rails/blob/v7.1.3.4/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb#L38 – it simply kept the reference of every fiber ever used with Rails (with the value of true), which lead to the server leaking quite a significant amount of memory for obvious reasons.

We are still gonna be running more stress tests, but so far it seems that at least the memory usage is totally fine.

ioquatix commented 1 month ago

Thanks for investigating! @byroot FYI.