ruby-concurrency / concurrent-ruby

Modern concurrency tools including agents, futures, promises, thread pools, supervisors, and more. Inspired by Erlang, Clojure, Scala, Go, Java, JavaScript, and classic concurrency patterns.
https://ruby-concurrency.github.io/concurrent-ruby/
Other
5.68k stars 418 forks source link

`Concurrent::Hash` default initialization is not fully thread-safe #970

Open mensfeld opened 1 year ago

mensfeld commented 1 year ago

Based on the docs:

A thread-safe subclass of Hash. This version locks against the object itself for every method call, ensuring only one thread can be reading or writing at a time. This includes iteration methods like #each, which takes the lock repeatedly when reading an item.

Given this code:

h = Concurrent::Hash.new do |hash, key|
  hash[key] = Concurrent::Array.new
end

the initialization is not thread-safe.

Note from @eregon, the thread-safe variant of this code is:

h = Concurrent::Map.new do |hash, key|
  hash.compute_if_absent(key) { Concurrent::Array.new }
end

Obviously the latter part of the doc indicates that:

ensuring only one thread can be reading or writing at a time

but the initial part makes it confusing:

This version locks against the object itself for every method call

It can be demoed by running this code:

require 'concurrent-ruby'

1000.times do
  h = Concurrent::Hash.new do |hash, key|
    hash[key] = Concurrent::Array.new
  end

  100.times.map do
    Thread.new do
      h[:na] << true
    end
  end.each(&:join)

  raise if h[:na].count != 100
end

I would expect to either:

  1. Have the initialization block behind a mutex - so there is no conflict
  2. Have the docs updated (I can do that)

Works like so:

require 'concurrent-ruby'

m = Mutex.new

1000.times do
  h = Concurrent::Hash.new do |hash, key|
    m.synchronize do
      break hash[key] if hash.key?(key)

      hash[key] = Concurrent::Array.new
    end
  end

  100.times.map do
    Thread.new do
      h[:na] << true
    end
  end.each(&:join)

  raise if h[:na].count != 100
end
nijikon commented 1 year ago

Oh, wow. This is interesting.

mensfeld commented 1 year ago

Same applies to the Concurrent::Map:

require 'concurrent-ruby'

1000.times do
  h = Concurrent::Map.new do |hash, key|
    hash[key] = Concurrent::Array.new
  end

  100.times.map do
    Thread.new do
      h[:na] << true
    end
  end.each(&:join)

  raise if h[:na].count != 100
end
nightpool commented 1 year ago

This pattern is widely prevalent in open source code and it's very very clear that developers assume that this works. I think it's very important to wrap this initializer block in a mutex and not just update the docs

granthusbands commented 1 year ago

It's now slightly complicated, as some (as above) have a fix that assumes the initializer is not in the mutex and so call compute_if_absent or such. Unless the mutex is reentrant or there's a test for this, it would then deadlock.

Also, it's suboptimal to use a mutex that's separate from the hash, as above; it would help for Concurrent::Hash to have some of the quality-of-life improvements of Concurrent::Map or at least a way to use the same lock.

mensfeld commented 1 year ago

@granthusbands if not fixable or brings weird problems to the table, maybe we could expand rubocop to notify on common mistakes, etc.

eregon commented 1 year ago

Thank you for the issue report. I generally agree we should fix this if we can. The question is how.

(1) We could (try to) use a lock around the whole initializer, but that is also a typical anti-pattern to hold a lock so long, and that can lead to deadlock (e.g., if 2 Concurrent::Hash initializer blocks refer to one another, like https://github.com/ruby-concurrency/concurrent-ruby/issues/627 which uses the block of each but for Hash/Map). This seems quite difficult given the various backends. Not all backends use a Mutex for instance or even a lock for all operations on a Concurrent::Hash. We'd need to somehow make it work for each of them independently. As a note, these are the semantics of ConcurrentHashMap#computeIfAbsent in Java. That also says: The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map. that latter part which we cannot guarantee for an arbitrary initializer block, so it feels a bit wrong at least. Yet another challenge is Concurrent::Hash is currently just ::Hash on CRuby, but can't anymore if we fix this. From that view 1) this might be considered caused by CRuby Hash and 2) it may make sense to actually try to fix this in core Hash.

(2) We could do what I suggested in my PhD thesis to solve basically the same issue but on Hash itself (BTW, CRuby Hash does not guarantee this): https://eregon.me/blog/assets/research/thesis-thread-safe-data-representations-in-dynamic-languages.pdf page 83 Idiomatic Concurrent Hash Operations. In short, it replaces []= calls in the initializer block with put_if_absent by passing a different object than the Concurrent::Hash itself, which overrides []= and delegates the rest.

It's a classic "pick 1 or 2 but all 3 seems impossible":

eregon commented 1 year ago

I've filed an issue on the CRuby tracker to see what they think about the same problem for core Hash: https://bugs.ruby-lang.org/issues/19237

eregon commented 1 year ago

FWIW CRuby closed that ticket and added documentation that Hash is not thread-safe for that case: https://bugs.ruby-lang.org/issues/19237#note-2 and https://github.com/ruby/ruby/commit/ffd52412ab813854d134dbbc2b60f698fe536487. I think it makes sense to solve this for Concurrent::Hash and Concurrent::Map. We'll need to pick one of the two approaches above.

Using the lock approach would also fix https://github.com/ruby-concurrency/concurrent-ruby/issues/929, but makes it prone to deadlocks. For example we'll already need Monitor and not Mutex to let existing usages of compute_if_absent inside the block work fine and not error due to trying to lock the same Mutex again.

Using an object forwarding []= differently might be surprising.

Another option would be to pass a special object to the block, which warns on []= inside the block as that's not atomic, to let people know they should use compute_if_absent instead.