Closed zhulik closed 2 years ago
Is any state shared between workers? i.e. global connection pool or something.
Maybe unintentionally, each thread is supposed to have it's own connection pool
Thread.current.thread_variable_set(:grumlin_default_pool,
Async::Pool::Controller.new(Grumlin::Client::PoolResource,
limit: config.pool_size))
There is a shared mutex which is probably does not make any sense since I use Thread.current https://github.com/babbel/grumlin/blob/782c0f18ead3c8ccf1f72ff0f9e38a0dd9e51da6/lib/grumlin.rb#L157
Is there any chance resources are being leaked across threads?
The stack trace tells me that you have a fiber, possibly from a different thread, waiting on a resource or condition of a different thread.
We could debug this by logging the thread the condition was created on and bombing out as soon as it fails:
require 'async'
module CrossThreadBomb
def initialize
super
@thread = Thread.current
end
def wait
if Thread.current != @thread
raise "Cross-thread bomb!"
end
super
end
end
Async::Condition.prepend(CrossThreadBomb)
Async do
condition = Async::Condition.new
Thread.new do
condition.wait
end.join
end
Thanks a lot! Unfortunately I can't reproduce the issue locally, I'll deploy your snippet and wait till it explodes
Apparently this didn't help, I'm still getting exactly the same exception. Any hints what else classes I may monkey patch in the similar way?
UPD: I actually can't even reproduce the issue with these samples:
Async do
condition = Async::Condition.new
Thread.new do
condition.wait
end.join
end
Crashes with an exception
#<Thread:0x00007f765a72cb58 (irb):11 run> terminated with exception (report_on_exception is true):
/home/user/.asdf/installs/ruby/3.1.2/lib/ruby/gems/3.1.0/gems/async-2.0.3/lib/async/condition.rb:56:in `wait': undefined method `transfer' for nil:NilClass (NoMethodError)
Fiber.scheduler.transfer
^^^^^^^^^
from (irb):12:in `block (2 levels) in start'
irb(main):015:0> 50.66s warn: Async::Task [oid=0xb8c4] [ec=0xb9a0] [pid=824748] [2022-08-24 12:35:30 +0200]
| Task may have ended with unhandled exception.
| NoMethodError: undefined method `transfer' for nil:NilClass
|
| Fiber.scheduler.transfer
| ^^^^^^^^^
| → /home/user/.asdf/installs/ruby/3.1.2/lib/ruby/gems/3.1.0/gems/async-2.0.3/lib/async/condition.rb:56 in `wait'
| (irb):12 in `block (2 levels) in start
and
Async do
condition = Async::Condition.new
Thread.new do
Async do
condition.wait
end
end.join
end
Does not crash at all
UPD 2: Same behavior with ruby 3.0 and async 1.30
Hmm, that's odd, I tested it locally and it blew up. Let me check it again.
This snippet actually causes Condition#transfer
to raise a FiberError
once task is finished:
require "async"
Async do |reactor|
task = reactor.async { Async::Task.current.sleep(1) }
Thread.new do
Async do
task.wait
end
end
end
Traceback (most recent call last):
5: from /home/user/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/async-1.30.3/lib/async/task.rb:272:in `block in make_fiber'
4: from /home/user/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/async-1.30.3/lib/async/task.rb:287:in `finish!'
3: from /home/user/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/async-1.30.3/lib/async/condition.rb:62:in `signal'
2: from /home/user/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/async-1.30.3/lib/async/condition.rb:62:in `each'
1: from /home/user/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/async-1.30.3/lib/async/condition.rb:63:in `block in signal'
/home/user/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/async-1.30.3/lib/async/condition.rb:63:in `resume': fiber called across threads (FiberError)
Sorry, I didn't have time to look at this, I'll spend some time tonight.
You were absolutely right, the pool created in one thread may have been memoized and used from another thread. I removed the memoization and it fixed the issue
Thank you!
For future minds that find this first while searching for the error message, for me the trigger was the Faraday adapter, see https://github.com/socketry/async-http-faraday/issues/31
Hello,
for some time I use async inside sidekiq jobs, each job starts it's own reactor using this simple server middleware:
Code running in the job does not spawn any threads, but sidekiq itself is threaded, but I get
FiberError: fiber called across threads
from time to time.Stacktrace:
What would be the best way to debug and fix it? Reduce concurrency of sidekiq to 1 and spawn more processes instead?