Open machty opened 4 years ago
I can't verify this removes all deadlock conditions, but it's a start:
if Rails.env.production?
puts "Calling define_attribute_methods on all ActiveRecord models..."
ActiveRecord::Base.descendants.each(&:define_attribute_methods)
puts "Calling define_attribute_methods on all ActiveRecord models... DONE"
end
I added this before starting the async engine. It adds about 4s to startup time on Heroku but seems to work.
ActiveRecord is not fiber safe unfortunately.
Using a mutex within fiber code can cause problems like this. @ko1 as we discussed, FYI. Maybe it's necessary to make Mutex
fiber aware.
On the other hand, I would also say, that holding a mutex while issuing a database query could be considered a poor usage of mutex, because the duration is so long. I don't believe it's good idea to hold a mutex when doing I/O, although AR metadata load may be a valid use case.
One solution would be to detect when doing I/O within a mutex, and revert back to blocking behaviour. We also considered wrapping I/O operations in some kind of Fiber.exclusive
method which can have the same effect.
As you know, Falcon is still experimental system. I appreciate you trying it out and reporting back.
IMHO this is a problem that the Mutex in MRI is currently held per Thread, but should be held per Fiber. Then Fiber 2 in your example would just wait for it, as it should.
The issue is then if those 2 Fibers are from the same Thread, Fiber 2 would block forever since that Mutex is taken by Fiber 1. To solve that I think we need to make Mutex
fiber aware, or in the context of @ioquatix scheduler PR, call back to the scheduler if a Mutex cannot be acquired. Then I believe everything would work nicely for this case.
What does it mean to "make Mutex
fiber aware"? Does that mean changing Mutex in the Ruby standard lib?
I think ideally yes, and https://github.com/ruby/ruby/pull/1870 for example when introducing the scheduler could also call to the scheduler for Mutex#lock if the current Thread has a scheduler (otherwise just normal behavior).
Definitely needs a MRI change to have Mutex owned by Fibers and not Threads.
Not sure what async-postgres/io can do for this, @ioquatix ?
Rails loads table/column info from the database, and this data is turn used to define dynamic methods backed by database columns. The code that performs this initial column load is wrapped in a mutex, which prevents thread safety bugs, but if you're using async-postgres, there's a nondeterministic chance you'll get bitten by a
ThreadError: deadlock; recursive locking
error in (at least) the following case:async_exec
, which async-postgres modifies to return control back to the reactordeadlock
because it's trying to lock the mutex from the same thread.I'm not sure what the solution is here, wouldn't be surprised if this was biting falcon users too; note that I'm seeing this in a Sidekiq-ish style background worker that I wrote with the following libs:
locked at the following versions
I don't really have a solution at this time, is there perhaps a way I can preload all of these column datas? I looked at the
falcon
repo but didn't see anyone else running into this particular deadlock.