Closed abrandoned closed 7 years ago
Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @rafaelfranca (or someone else) soon.
If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.
Please see the contribution instructions for more information.
As this is a difficult condition to test, I came up with an artificial test that demonstrates the problem (with sleeps instead of context switching) to hopefully illuminate why the current implementation causes issues
If we swap out the Visitor
class with the class below (which puts sleeps in place to demonstrate where a thread switch can occur) ... then run the test file below and it will fail
module Arel
module Visitors
class Visitor
def initialize
@dispatch = get_dispatch_cache
end
def accept object
visit object
end
private
def self.dispatch_cache
Hash.new do |hash, klass|
hash[klass] = "visit_#{(klass.name || '').gsub('::', '_')}"
end
end
def get_dispatch_cache
self.class.dispatch_cache
end
def dispatch
@dispatch
end
$done_defining = false
def first_time
def first_time
false
end
true
end
def visit object
send dispatch[object.class], object
rescue NoMethodError => e
unless first_time
sleep 1 until $done_defining
end
raise e if respond_to?(dispatch[object.class], true)
superklass = object.class.ancestors.find { |klass|
respond_to?(dispatch[klass], true)
}
raise(TypeError, "Cannot visit #{object.class}") unless superklass
sleep 1
dispatch[object.class] = dispatch[superklass]
$done_defining = true
retry
end
end
end
end
Contrived failing test
require 'helper'
module Arel
module Visitors
describe 'avoiding contamination between visitor dispatch tables' do
before do
@connection = Table.engine.connection
@table = Table.new(:users)
end
it 'does not throw NoMethodError when multi-threaded defining "visit"' do
::Thread.abort_on_exception = true
threads = []
5.times do
threads << ::Thread.new do
node = Nodes::Union.new(Nodes::True.new, Nodes::False.new)
assert_equal "( TRUE UNION FALSE )", node.to_sql
node.first # from Nodes::Node's Enumerable mixin
assert_equal "( TRUE UNION FALSE )", node.to_sql
end
end
threads.map(&:join)
end
end
end
end
While this isn't a test to be put in the test suite, I don't know if a test can be written that detects this type of concurrency problem
This mishandles a legitimate NoMethodError... it's a bit of a worry it's not failing a test 😕
@matthewd which legitimate NoMethodError is mishandled? (we haven't seen that case)
I don't believe the tests are written to test concurrency cases .... so not sure how to address that
A regular, actual, ordinary NoMethodError raised by the body of the dispatched method -- no concurrency needed.
The thing this line is there for.
Obviously such an error won't be raised under normal operation. But when it is, we should not iloop.
@matthewd don't know if that is "better", but at least accounts for the case where a legitimate NoMethodError is thrown (and not dependent on the dynamically defined method)
@matthewd tried another pattern of dispatching the visit to account for the legitimate case and added a test for it
this same problem looks like it could impact reduce
as well https://github.com/rails/arel/blob/master/lib/arel/visitors/reduce.rb#L12
@matthewd put up a patch for reduce
as well and removed the need for the mutex, the rare case where a method is redefined because of a context switch should not be impactful to the outcome ... moving to dispatch_method
allows the retry
to be removed and a spec is added for a "legitimate" NoMethodError
Thanks for working on this :heart:
I've ended up going with #472 instead
I don't know if this the "correct" way to resolve this problem, but the issue referenced below have been experienced on our platform as well (in a Jruby, multi-threaded application)
https://github.com/rails/rails/issues/26571 https://github.com/jruby/activerecord-jdbc-adapter/issues/739
when we apply the patch we no longer experience the issue
Hoping this PR can help stimulate conversation as to the "correct" way to make the rescue and call to
dispatch[object.class] =
threadsafe