Closed Renkai closed 4 years ago
This would be very useful. Because of no atomic get_or_create
(or get_or_insert
) the current API doesn't allow to implement a concurrent multi map.
Multi maps are often implemented as hashmaps where value is a vector.
But on inserting the first (K, V) pair, the value vector itself doesn't exist yet, so get_mut
is of not much use as it would return None. Even handling this None
case separately by insert
ing a fresh empty vector in such case could lead to overwriting another item concurrently inserted. So we need an atomic (interlocked) way of getting a mutable reference to the value, and inserting/creating a new value if not exists.
BTW: get_or_insert
is probably a better name than get_or_update
. If get fails, there is nothing to update. And don't forget about get_or_insert_mut
variant, that would allow us to modify the returned value while still holding the write lock.
Maybe
.entry(...).or_insert_with(...)
do what I expected, but I'm not very sure
@pkolaczk The API has surface for this using the entry API like @Renkai mentioned and it's family of or_
methods. Closing.
Hi!
I leave a question here just to be sure the following case is covered.
Supposed to have the following code:
aDashMap.entry(key).or_insert_with(|| { compute_new_value() })
executing concurrently from several threads on the same DashMap
instance.
Is it safe to hope the update doesn't suffer a race condition? e.g. given 2 threads getting the entry entry(key)
and both trying to compute and insert the new value on the empty entry at the same time, one will be success and the other one will fail?
If it's safe, just out of curiosity, is it achieved by getting a write lock here?
Thanks in advance!
You're correct. It's achieved by keeping the shard lock during the lifetime of the Entry instance.
Hmm, so maybe I'm misunderstanding this, but it doesn't sound like the behavior is ideal here for many applications. It sounds like any time you are using this entry.or_insert_with method, you will need to get a write lock to the exclusion of all readers on that shard/key. Is that correct?
If you have a read-heavy workflow where it only needs to initialize the value a single time, then this method would be very bad for performance, essentially turning access of the object into a regular mutex. Or am I misunderstanding this?
Would it be possible to create a dedicated get_or_insert method on DashMap itself that wouldn't require the write lock unless it actually needs to insert? This is how the GetOrAdd method from ConcurrentDictionary in C# works, which is my guide for how I would expect a concurrent HashMap type of data structure to work in Rust.
DashMap is a really cool library by the way, and thanks for putting all the time and effort into it!
@foliot Checking the implementation here, an exclusive write lock is held throughout the lifetime of Entry<'a, K, V, S>
. For ready-heavy workloads, this will definitely result in some performance bottlenecks.
I think it's important that the docs are updated to reflect this. At the moment, the docs say Locking behavior: May deadlock if called when holding any sort of reference into the map.
which doesn't seem to reflect the impact it could have.
It looks like there is now a downgrade() method allowing you to release the write lock as soon as you determine that the entry you need is in the map. This is actually the necessary for safety as you need the equivalent of AcqRel memory ordering when get_or_insert must yield a value and that value must be visible to two or more threads attempting to access the same shared memory.
That said it would be nice if it was possible to test if something exists with an immutable reference (read lock) before attempting to acquire a write lock. Unfortunately, there is no upgrade mechanism provided by the operating systems so it would require releasing the read lock and acquiring the write lock, which in a write heavy workload would be equivalent to downgrade(). For the specific workloads mentioned in this thread (mostly read, only insert on miss) this would make a lot of sense because the most time was spent hashing when I flamegraphed (at least on my workload). So if 99% of the time the read lock is sufficient and you can just return the value without acquiring a write lock you avoid contesting over an exclusive write lock. You also avoid having to provide ownership of a key that is going to be thrown away since it is already in the map.
To provide a more specific use case I needed to setup a lookup table for a data field with relatively low cardinality (~a few hundred values) in a relatively large dataset (billions of rows), so it ends up looking like DashMap<String, Arc> and I end up having to copy the String every time to insert (entry(...) takes ownership) or DashMap<Arc, Arc> which incurs the overhead of having to wrap each string in Arc at least once. The latter was obviously smaller so it was what I ended up doing, but still requires acquiring an exclusive write lock on the shard each time when 99% of the workload is reads.
One last thing is that try_insert is in nightly and could be used for this, but unfortunately requires an exclusive lock to use the HashMap from multiple threads. In addition, you have to unpack the current value from an error, which doesn't feel quite right.
When using
RwLock<HashMap<K, V>>
, if we want to create a object only while specific is key is missing, we may use a method like https://github.com/tikv/rust-prometheus/blob/v0.8.0/src/vec.rs#L156It is, of course not elegant for Dashmap, so Dashmap may need a method like getOrElseUpdate in Scala.