socketry / async

An awesome asynchronous event-driven reactor for Ruby.
MIT License
2.09k stars 86 forks source link

Memory leak when using Task#with_timeout #176

Closed jpaulgs closed 1 year ago

jpaulgs commented 2 years ago
#!/usr/bin/env ruby

require 'async'
require 'async/queue'

@input_channel = Async::LimitedQueue.new(100)

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do
        @input_channel.dequeue
      rescue Async::TimeoutError
        # no-op
      end
    end
  end
end

task = consumer

task.wait

Hey, I'm running ruby 3.1.2 and version 2.0.3 of the Async rubygem. The code above is similar to what we are using. It exhibits a memory leak. The following photo was from running this for about 15 minutes, it certainly never releases the memory.

image

I'll keep digging to see what I can find out.

jpaulgs commented 2 years ago

Actually the LimitedQueue is irrelevant here, the following code also shows the issue

#!/usr/bin/env ruby

require 'async'

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do
        # no-op
      rescue Async::TimeoutError
        # no-op
      end
    end
  end
end

task = consumer

task.wait

image

ioquatix commented 2 years ago

Hmm, thanks for reporting this, I'll take a look.

ioquatix commented 2 years ago
#!/usr/bin/env ruby

require 'async'
require 'memory'

def consumer
    Async do |task|
        report = Memory.report do
            1000000.times do
                task.with_timeout(0.001) do
                    # no-op
                rescue Async::TimeoutError
                    # no-op
                end
            end
        end
        report.print
        binding.irb
    end
end

task = consumer

task.wait

Memory Profile

By Gem (288.00 MiB in 4000000 allocations)

By File (288.00 MiB in 4000000 allocations)

By Location (288.00 MiB in 4000000 allocations)

By Class (288.00 MiB in 4000000 allocations)

ioquatix commented 2 years ago

I see what is going on.

https://github.com/socketry/timers/blob/64a745a8b2f28849786c8c683ed2a4ef2eb0ae47/lib/timers/events.rb#L71

In order to avoid scheduling timeouts which is expensive, we put them into a queue. Most timeouts are short lived and don't fire because the operation happens immediately.

e.g.

task.with_timeout(0.1) do
  io.read
end

most of the time, io.read returns immediately, so we want the cost of the timeout to be minimal. So we defer as much work as possible, because it also makes cleaning up expensive.

Basically, with_timeout is a no-op unless some operation actually blocks.

There is a @queue which stores these as yet processed timers, and they get processed only on the next call to wait.

https://github.com/socketry/timers/blob/64a745a8b2f28849786c8c683ed2a4ef2eb0ae47/lib/timers/events.rb#L107

However, if you have a tight loop, this queue can grow indefinitely.

I think the solution is just to flush the queue after it gets to a certain size, probably 100 or 1000. The goal is to avoid adding a huge overhead, while also minimising the cost of inserting timers.

jpaulgs commented 2 years ago

I was starting to focus on @timers = Set.new in timers/group.rb

#!/usr/bin/env ruby

require 'async'

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do |timer|
        timer.cancel
        print '.'
      rescue Async::TimeoutError

      end
    end
  end
end

task = consumer

task.wait

Which does seem to stabilise the memory usage but that may have more todo with the IO (which also seems to be required)

ioquatix commented 2 years ago

Any time you context switch e.g. on IO, the timers queue is flushed. We just need to flush it more often.

ZimbiX commented 2 years ago

Thanks for the speedy assistance, @ioquatix! :heart:

Should we be adding the timer.cancel line to our code, or are you working on / thinking about doing that internally? I see you've just put up a branch of timers here, which looks to be addressing the flushing aspect (cheers!), but not the cancel (yet)?

(I've been debugging our memory leak with @jpaulgs and @oeoeaio)

ioquatix commented 2 years ago

I'm going to release an update to fix the issue in the timers gem.

ioquatix commented 2 years ago

https://github.com/socketry/timers/commit/1ec78fbe59c934925e5f4a32fbd5a380c8120328 is the initial attempt at a fix, without introducing too much overhead.

To be honest, using a linked list here would make way more sense. My plan has always been to replace the timers gem with C implementation in io-event gem, I just haven't got around to it yet.

This should avoid the worst case, but it won't avoid the case where you create lots of timers and interleave cancelled and non-cancelled timers. We could do more work in flush!... doing a lot of scans would mitigate a lot of the benefit.

oeoeaio commented 1 year ago

We have tested the changes to the timers gem and while they do appear to remove the accumulation of Timers::Timer, there is still a memory leak in our code (which uses a LimitedQueue as per the OP). Investigating further, we identified this line as the culprit. It appears that when used in conjunction with a Task.with_timeout, instances of Queue (and LimitedQueue) will accumulate fibers in their @waiting list in the absence of any other fiber signalling the Queue, leading to a memory leak.

We've made an attempt to fix this issue in #186

ioquatix commented 1 year ago

Thanks for this update.

ioquatix commented 1 year ago

The latest release should address the memory leaks.

ioquatix commented 1 year ago

I am going to close this issue. Based on the reproduction scripts, the issue no longer exists.