jpaulgs commented 2 years ago

#!/usr/bin/env ruby

require 'async'
require 'async/queue'

@input_channel = Async::LimitedQueue.new(100)

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do
        @input_channel.dequeue
      rescue Async::TimeoutError
        # no-op
      end
    end
  end
end

task = consumer

task.wait

Hey, I'm running ruby 3.1.2 and version 2.0.3 of the Async rubygem. The code above is similar to what we are using. It exhibits a memory leak. The following photo was from running this for about 15 minutes, it certainly never releases the memory.

I'll keep digging to see what I can find out.

jpaulgs commented 2 years ago

Actually the LimitedQueue is irrelevant here, the following code also shows the issue

#!/usr/bin/env ruby

require 'async'

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do
        # no-op
      rescue Async::TimeoutError
        # no-op
      end
    end
  end
end

task = consumer

task.wait

ioquatix commented 2 years ago

Hmm, thanks for reporting this, I'll take a look.

ioquatix commented 2 years ago

#!/usr/bin/env ruby

require 'async'
require 'memory'

def consumer
    Async do |task|
        report = Memory.report do
            1000000.times do
                task.with_timeout(0.001) do
                    # no-op
                rescue Async::TimeoutError
                    # no-op
                end
            end
        end
        report.print
        binding.irb
    end
end

task = consumer

task.wait

Memory Profile

Total Allocated: (28.00 MiB in 4000000 allocations)
Total Retained: (7.00 MiB in 1000000 allocations)

By Gem (288.00 MiB in 4000000 allocations)

(288.00 MiB in 4000000 allocations) timers-4.3.3

By File (288.00 MiB in 4000000 allocations)

(160.00 MiB in 2000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/timer.rb
(88.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/group.rb
(40.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/events.rb

By Location (288.00 MiB in 4000000 allocations)

(160.00 MiB in 2000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/timer.rb:38
(88.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/group.rb:60
(40.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/events.rb:72

By Class (288.00 MiB in 4000000 allocations)

(160.00 MiB in 2000000 allocations) Proc
(88.00 MiB in 1000000 allocations) Timers::Timer
(40.00 MiB in 1000000 allocations) Timers::Events::Handle

ioquatix commented 2 years ago

I see what is going on.

https://github.com/socketry/timers/blob/64a745a8b2f28849786c8c683ed2a4ef2eb0ae47/lib/timers/events.rb#L71

In order to avoid scheduling timeouts which is expensive, we put them into a queue. Most timeouts are short lived and don't fire because the operation happens immediately.

e.g.

task.with_timeout(0.1) do
  io.read
end

most of the time, io.read returns immediately, so we want the cost of the timeout to be minimal. So we defer as much work as possible, because it also makes cleaning up expensive.

Basically, with_timeout is a no-op unless some operation actually blocks.

There is a @queue which stores these as yet processed timers, and they get processed only on the next call to wait.

https://github.com/socketry/timers/blob/64a745a8b2f28849786c8c683ed2a4ef2eb0ae47/lib/timers/events.rb#L107

However, if you have a tight loop, this queue can grow indefinitely.

I think the solution is just to flush the queue after it gets to a certain size, probably 100 or 1000. The goal is to avoid adding a huge overhead, while also minimising the cost of inserting timers.

jpaulgs commented 2 years ago

I was starting to focus on @timers = Set.new in timers/group.rb

#!/usr/bin/env ruby

require 'async'

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do |timer|
        timer.cancel
        print '.'
      rescue Async::TimeoutError

      end
    end
  end
end

task = consumer

task.wait

Which does seem to stabilise the memory usage but that may have more todo with the IO (which also seems to be required)

ioquatix commented 2 years ago

Any time you context switch e.g. on IO, the timers queue is flushed. We just need to flush it more often.

ZimbiX commented 2 years ago

Thanks for the speedy assistance, @ioquatix! :heart:

Should we be adding the timer.cancel line to our code, or are you working on / thinking about doing that internally? I see you've just put up a branch of timers here, which looks to be addressing the flushing aspect (cheers!), but not the cancel (yet)?

(I've been debugging our memory leak with @jpaulgs and @oeoeaio)

ioquatix commented 2 years ago

I'm going to release an update to fix the issue in the timers gem.

ioquatix commented 2 years ago

https://github.com/socketry/timers/commit/1ec78fbe59c934925e5f4a32fbd5a380c8120328 is the initial attempt at a fix, without introducing too much overhead.

To be honest, using a linked list here would make way more sense. My plan has always been to replace the timers gem with C implementation in io-event gem, I just haven't got around to it yet.

This should avoid the worst case, but it won't avoid the case where you create lots of timers and interleave cancelled and non-cancelled timers. We could do more work in flush!... doing a lot of scans would mitigate a lot of the benefit.

oeoeaio commented 1 year ago

We have tested the changes to the timers gem and while they do appear to remove the accumulation of Timers::Timer, there is still a memory leak in our code (which uses a LimitedQueue as per the OP). Investigating further, we identified this line as the culprit. It appears that when used in conjunction with a Task.with_timeout, instances of Queue (and LimitedQueue) will accumulate fibers in their @waiting list in the absence of any other fiber signalling the Queue, leading to a memory leak.

We've made an attempt to fix this issue in #186

ioquatix commented 1 year ago

Thanks for this update.

ioquatix commented 1 year ago

The latest release should address the memory leaks.

ioquatix commented 1 year ago

I am going to close this issue. Based on the reproduction scripts, the issue no longer exists.

socketry / async

Memory leak when using Task#with_timeout #176

Memory Profile

By Gem (288.00 MiB in 4000000 allocations)

By File (288.00 MiB in 4000000 allocations)

By Location (288.00 MiB in 4000000 allocations)

By Class (288.00 MiB in 4000000 allocations)