ms-ati / docile

Docile keeps your Ruby DSLs tame and well-behaved
http://ms-ati.github.com/docile/
MIT License
419 stars 34 forks source link

Potential thread safety issue? #57

Open michaeldiscala opened 3 years ago

michaeldiscala commented 3 years ago

Hi Marc!

I'm experimenting with some multi-threaded code that uses Docile and I think I may have encountered a thread safety issue. I'm able to reproduce with the following code (depends on the https://github.com/grosser/parallel gem):

require 'parallel'
require 'docile'
class MyClass
  def run_in_threads
      Parallel.each(
         [ Array.new, Array.new ],
        in_threads: 2
     ) { |array| Docile.dsl_eval_with_block_return(array, &block) }
  end

  def block
      lambda { do_some_work }
  end

  def do_some_work
     sleep(rand)
     push 1
  end
end

MyClass.new.run_in_threads

This pretty reliably gives me:

Traceback (most recent call last):
        8: from /Users/mdiscala/.gem/ruby/2.6.6/gems/parallel-1.19.2/lib/parallel.rb:211:in `block (4 levels) in in_threads'
        7: from /Users/mdiscala/.gem/ruby/2.6.6/gems/parallel-1.19.2/lib/parallel.rb:360:in `block in work_in_threads'
        6: from /Users/mdiscala/.gem/ruby/2.6.6/gems/parallel-1.19.2/lib/parallel.rb:519:in `with_instrumentation'
        5: from /Users/mdiscala/.gem/ruby/2.6.6/gems/parallel-1.19.2/lib/parallel.rb:361:in `block (2 levels) in work_in_threads'
        4: from /Users/mdiscala/.gem/ruby/2.6.6/gems/parallel-1.19.2/lib/parallel.rb:510:in `call_with_index'
        3: from (irb):237:in `block in run_in_threads'
        2: from (irb):241:in `block in block'
        1: from (irb):246:in `do_some_work'
NoMethodError (undefined method `push' for #<MyClass:0x00007fa66f90c9b0>)

When I run this with only a single thread (changing in_threads: 1), the output comes as expected:

irb(main):273:0> MyClass.new.run_in_threads
=> [[1], [1]]

From my reading of the code, it seems that Docile achieves the ability for do_some_work to call methods on the DSL object by adding a method_missing call to the base class (see https://github.com/ms-ati/docile/blob/master/lib/docile/fallback_context_proxy.rb#L54). The gem then cleans up after itself after the calls are complete https://github.com/ms-ati/docile/blob/master/lib/docile/fallback_context_proxy.rb#L54.

My hunch is that I get the method missing error because the clean up happens in one thread and then removes the methods for the other one. I could also imagine getting into cases where the push method ends up writing to the wrong array, depending on the timing. Is that understanding correct?

If so, it seems like this may be difficult to make thread safe. The best I can think of is to (1) assign the receiver to a thread variable in https://github.com/ms-ati/docile/blob/master/lib/docile/fallback_context_proxy.rb#L5, (2) have the method missing definition dispatch calls to the receiver pulled from the thread variable, and (3) never cleanup the method_missing definition.

This doesn't quite seem worth it to support to me, but am curious for your take on all of the above!

Happy holidays & hope all is well! --Mike

ms-ati commented 3 years ago

@michaeldiscala As a baseline "simplest possible" solution to thread-safety in this context, would it be sufficient to document that:

Note, I am not at all ruling out investigating approaches to thread safety within Docile. But for now, I want to check my understanding, does the above make sense?

ms-ati commented 3 years ago

For example, does the following adaptation work for you?


require 'parallel'
require 'docile'

class MyClass
  def run_in_threads
      Parallel.each(
         [ 
           [Array.new, Mutex.new],
           [Array.new, Mutex.new]
         ],
        in_threads: 2
     ) do |array, mutex| 
      mutex.synchronize do
        Docile.dsl_eval_with_block_return(array, &block)
      end
    end  
  end

  def block
      lambda { do_some_work }
  end

  def do_some_work
     sleep(rand)
     push 1
  end
end

MyClass.new.run_in_threads
ms-ati commented 3 years ago

UPDATE: the above does not work -- the singleton class of the lexical context needs a single mutex for all access, which is more to your point above! Apologies

ms-ati commented 3 years ago

@michaeldiscala I'm tagging this to future Docile 2.0, where I intend to change the implementation to never mutate the DSL nor block contexts, by removing the need to instrument method_missing for fallbacks.