rails / arel

A Relational Algebra
2.06k stars 390 forks source link

patch for threadsafe rescue and redefine of NoMethodError #465

Closed abrandoned closed 7 years ago

abrandoned commented 7 years ago

I don't know if this the "correct" way to resolve this problem, but the issue referenced below have been experienced on our platform as well (in a Jruby, multi-threaded application)

https://github.com/rails/rails/issues/26571 https://github.com/jruby/activerecord-jdbc-adapter/issues/739

when we apply the patch we no longer experience the issue

Hoping this PR can help stimulate conversation as to the "correct" way to make the rescue and call to dispatch[object.class] = threadsafe

rails-bot commented 7 years ago

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @rafaelfranca (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

abrandoned commented 7 years ago

As this is a difficult condition to test, I came up with an artificial test that demonstrates the problem (with sleeps instead of context switching) to hopefully illuminate why the current implementation causes issues

If we swap out the Visitor class with the class below (which puts sleeps in place to demonstrate where a thread switch can occur) ... then run the test file below and it will fail

module Arel
  module Visitors
    class Visitor
      def initialize
        @dispatch = get_dispatch_cache
      end

      def accept object
        visit object
      end

      private

      def self.dispatch_cache
        Hash.new do |hash, klass|
          hash[klass] = "visit_#{(klass.name || '').gsub('::', '_')}"
        end
      end

      def get_dispatch_cache
        self.class.dispatch_cache
      end

      def dispatch
        @dispatch
      end

      $done_defining = false

      def first_time
        def first_time
          false
        end

        true
      end

      def visit object
        send dispatch[object.class], object
      rescue NoMethodError => e
        unless first_time
          sleep 1 until $done_defining
        end

        raise e if respond_to?(dispatch[object.class], true)
        superklass = object.class.ancestors.find { |klass|
          respond_to?(dispatch[klass], true)
        }
        raise(TypeError, "Cannot visit #{object.class}") unless superklass
        sleep 1
        dispatch[object.class] = dispatch[superklass]
        $done_defining = true
        retry
      end
    end
  end
end

Contrived failing test

require 'helper'

module Arel
  module Visitors
    describe 'avoiding contamination between visitor dispatch tables' do
      before do
        @connection = Table.engine.connection
        @table = Table.new(:users)
      end

      it 'does not throw NoMethodError when multi-threaded defining "visit"' do
        ::Thread.abort_on_exception = true
        threads = []

        5.times do
          threads << ::Thread.new do
            node = Nodes::Union.new(Nodes::True.new, Nodes::False.new)
            assert_equal "( TRUE UNION FALSE )", node.to_sql

            node.first # from Nodes::Node's Enumerable mixin

            assert_equal "( TRUE UNION FALSE )", node.to_sql
          end
        end

        threads.map(&:join)
      end
    end
  end
end

While this isn't a test to be put in the test suite, I don't know if a test can be written that detects this type of concurrency problem

matthewd commented 7 years ago

This mishandles a legitimate NoMethodError... it's a bit of a worry it's not failing a test 😕

abrandoned commented 7 years ago

@matthewd which legitimate NoMethodError is mishandled? (we haven't seen that case)

I don't believe the tests are written to test concurrency cases .... so not sure how to address that

matthewd commented 7 years ago

A regular, actual, ordinary NoMethodError raised by the body of the dispatched method -- no concurrency needed.

The thing this line is there for.

matthewd commented 7 years ago

Obviously such an error won't be raised under normal operation. But when it is, we should not iloop.

abrandoned commented 7 years ago

@matthewd don't know if that is "better", but at least accounts for the case where a legitimate NoMethodError is thrown (and not dependent on the dynamically defined method)

abrandoned commented 7 years ago

@matthewd tried another pattern of dispatching the visit to account for the legitimate case and added a test for it

this same problem looks like it could impact reduce as well https://github.com/rails/arel/blob/master/lib/arel/visitors/reduce.rb#L12

abrandoned commented 7 years ago

@matthewd put up a patch for reduce as well and removed the need for the mutex, the rare case where a method is redefined because of a context switch should not be impactful to the outcome ... moving to dispatch_method allows the retry to be removed and a spec is added for a "legitimate" NoMethodError

matthewd commented 7 years ago

Thanks for working on this :heart:

I've ended up going with #472 instead