Closed jpaulgs closed 1 year ago
Actually the LimitedQueue is irrelevant here, the following code also shows the issue
#!/usr/bin/env ruby
require 'async'
def consumer
Async do |task|
loop do
task.with_timeout(0.001) do
# no-op
rescue Async::TimeoutError
# no-op
end
end
end
end
task = consumer
task.wait
Hmm, thanks for reporting this, I'll take a look.
#!/usr/bin/env ruby
require 'async'
require 'memory'
def consumer
Async do |task|
report = Memory.report do
1000000.times do
task.with_timeout(0.001) do
# no-op
rescue Async::TimeoutError
# no-op
end
end
end
report.print
binding.irb
end
end
task = consumer
task.wait
I see what is going on.
In order to avoid scheduling timeouts which is expensive, we put them into a queue. Most timeouts are short lived and don't fire because the operation happens immediately.
e.g.
task.with_timeout(0.1) do
io.read
end
most of the time, io.read
returns immediately, so we want the cost of the timeout to be minimal. So we defer as much work as possible, because it also makes cleaning up expensive.
Basically, with_timeout
is a no-op unless some operation actually blocks.
There is a @queue
which stores these as yet processed timers, and they get processed only on the next call to wait
.
However, if you have a tight loop, this queue can grow indefinitely.
I think the solution is just to flush the queue after it gets to a certain size, probably 100 or 1000. The goal is to avoid adding a huge overhead, while also minimising the cost of inserting timers.
I was starting to focus on @timers = Set.new
in timers/group.rb
#!/usr/bin/env ruby
require 'async'
def consumer
Async do |task|
loop do
task.with_timeout(0.001) do |timer|
timer.cancel
print '.'
rescue Async::TimeoutError
end
end
end
end
task = consumer
task.wait
Which does seem to stabilise the memory usage but that may have more todo with the IO (which also seems to be required)
Any time you context switch e.g. on IO, the timers queue is flushed. We just need to flush it more often.
Thanks for the speedy assistance, @ioquatix! :heart:
Should we be adding the timer.cancel
line to our code, or are you working on / thinking about doing that internally? I see you've just put up a branch of timers
here, which looks to be addressing the flushing aspect (cheers!), but not the cancel (yet)?
(I've been debugging our memory leak with @jpaulgs and @oeoeaio)
I'm going to release an update to fix the issue in the timers gem.
https://github.com/socketry/timers/commit/1ec78fbe59c934925e5f4a32fbd5a380c8120328 is the initial attempt at a fix, without introducing too much overhead.
To be honest, using a linked list here would make way more sense. My plan has always been to replace the timers gem with C implementation in io-event
gem, I just haven't got around to it yet.
This should avoid the worst case, but it won't avoid the case where you create lots of timers and interleave cancelled and non-cancelled timers. We could do more work in flush!
... doing a lot of scans would mitigate a lot of the benefit.
We have tested the changes to the timers gem and while they do appear to remove the accumulation of Timers::Timer
, there is still a memory leak in our code (which uses a LimitedQueue as per the OP). Investigating further, we identified this line as the culprit. It appears that when used in conjunction with a Task.with_timeout
, instances of Queue
(and LimitedQueue
) will accumulate fibers in their @waiting
list in the absence of any other fiber signalling the Queue
, leading to a memory leak.
We've made an attempt to fix this issue in #186
Thanks for this update.
The latest release should address the memory leaks.
I am going to close this issue. Based on the reproduction scripts, the issue no longer exists.
Hey, I'm running ruby 3.1.2 and version 2.0.3 of the Async rubygem. The code above is similar to what we are using. It exhibits a memory leak. The following photo was from running this for about 15 minutes, it certainly never releases the memory.
I'll keep digging to see what I can find out.