rails / arel

A Relational Algebra
2.06k stars 390 forks source link

Make Visitor visit thread safe by holding dispatch method reference #472

Closed film42 closed 7 years ago

film42 commented 7 years ago

This is an alternative to https://github.com/rails/arel/pull/465

We experienced this under jruby in production, and it looks like we're not alone:

rails/rails#26571 jruby/activerecord-jdbc-adapter#739

This change doesn't change behavior, but it does fix the following problem:

  1. [thread 1] dispatches with dispatch[object.class] and a NoMethodError is raised.
  2. [thread 1] now checks respond_to?(dispatch[object.class]) and returns false.
  3. [thread 1] begins walking the ancestor tree.
  4. [thread 2] dispatches with dispatch[object.class] and a NoMethodError is raised.
  5. [thread 1] finds a super class and swaps it out in the dispatch cache: dispatch[object.class] = superklass.
  6. [thread 2] now checks respond_to?(dispatch[object.class]) and returns true, because the dispatch cache has changed, and an error is raised 😱 .

This PR makes each visit method store the dispatch[object.class] into a dispatch_method variable to prevent either thread from being impacted by the dispatch cached change.

So what happens when two threads hit the visit method now? With this change, both threads will walk through the ancestors, find a matching super class, and write it to the dispatch class. When either retries, they'll get the correct info. In other words, they'll both end up updating the dispatch cache with the same value. This is much better than an error being raised for one of them.

TESTING:

I was able to reproduce this by adding some random sleeps into the visitor to path like @abrandoned did. Wherever you see HERE, it means a NoMethodError was raised and we're about to check respond_to?. The test starts 20 threads and each thread runs the dispatch contamination test 10 times.

Before:

...
thread opened
thread opened
thread opened
thread opened
thread opened
thread opened
HEREHERE

E.

Finished in 1.234906s, 1.6196 runs/s, 24.2933 assertions/s.

  1) Error:
avoiding contamination between visitor dispatch tables#test_0002_does not throw NoMethodError when multi-threaded defining "visit":
NoMethodError: undefined method `visit_Arel_Nodes_Union' for #<Arel::Visitors::DepthFirst:0x6a8658ff>
Did you mean?  visit_Arel_Nodes_Grouping
               visit_Arel_Nodes_OuterJoin
               visit_Arel_Nodes_Sum
...

After:

...
thread opened
thread opened
thread opened
thread opened
thread opened
HERE
HERE
HERE
HEREHERE

HERE
thread done
thread done
thread done
thread done
thread done
thread done
...

EDIT: Forgot to link to the test gist: https://gist.github.com/film42/6e43ce2f27d2f91ee4daf011a20e1c0d

rafaelfranca commented 7 years ago

r? @matthewd mind to review this?

film42 commented 7 years ago

Just dropping a "ping" comment. Thanks 😄 ❤️ 💟 😘 😚 !!

film42 commented 7 years ago

Just dropping another ping on this. Thanks! ❤️ 💟 🍍 🏓 🤗 🥇 💯

film42 commented 7 years ago

@matthewd Just wondering if you're going to have time to review this.

matthewd commented 7 years ago

:heart: :green_heart: :blue_heart: :yellow_heart: :purple_heart: