jonhoo / flurry

A port of Java's ConcurrentHashMap to Rust
Apache License 2.0
533 stars 49 forks source link

["RFC"] convenient, race-free mutability #41

Open soruh opened 4 years ago

soruh commented 4 years ago

Abstract

As I see it we have no built in support for mutating data in a race-free way, this "RFC" will look into how we could implement this.

The Problem

Say a user wants to mutate some data in the list. Currently they need to use some kind of lock around their data (HashMap<K, Mutex<V>> || HashMap<K, RwLock<V>>). The downside to this it that it incurs overhead, even if no concurrent accesses are made. Now a "naive" solution would be this:

let mut item =  map.get(...).clone();

// modify `item` ...

map.insert(..., item);

However this basically the definition of a race condition, since while we modify item another thread could overwrite the data in the map thus making our clone of it stale.

The "easy" solution:

The easiest way to implement this would be to expose a compare_and_set or compare_and_swap function. This would alow users to make sure, the data they got, mutated and that they are now planning to store is still what it was, when they first read it.

pro

Proposed solution

I propose we add a variant to BinEntry: Mutating((std::sync::Condvar, Node<K, V>)) Which is basically the same as BinEntry::Node with an additional Condvar, which is notified once the write has finished.

This would allow us to have a get_mut function which returns a WriteGuard(we'll have to use guards anyways, see #16). This function finds the BinEntry for the key (if it exists), clones it's contents and replaces it with a BinEntry::Mutating(containing the original Node). All immutable reads can (/will have to be able to) just read the node data, but get_mut can wait on the Condvar for the ongoing read to be done and only clone the node data again once it has finished.

The WriteGuard would have to:

domenicquirl commented 4 years ago

This seems similar to the Entry API mentioned in #12. The Java code has a concept called ReservationNode, however @jonhoo points out in his comment that these are only used if the respective bin is empty at the time of reference. Nonetheless, I suggest that we try to find a unified solution for both issues.

With regard to rewriting the current function implementations, I'd say this is not such a big issue since such case distinctions will have to be added anyways for things like #13.

Overall I agree that your proposed solution is preferable over the "easy" one, in particular if we manage to use it for something like an Entry API as well.

jonhoo commented 4 years ago

I'm a little strapped for time these days, so won't have the time to give this quite the time it deserves, but my first instinct is that it seems unfortunate to hold the lock for the entire bin just to get mutable access to one value. Though admittedly, that is also what the proposed Entry API would do.

My second instinct is that it's not clear how we can ever give out &mut V, since reads never take the bin lock.

soruh commented 4 years ago

@jonhoo

it seems unfortunate to hold the lock for the entire bin just to get mutable access to one value.

We won't need to, since every item in the bin is itself a BinEntry and the Mutating still contains a Node with a next

My second instinct is that it's not clear how we can ever give out &mut V, since reads never take the bin lock.

We can't, so we need to clone the contents into our WriteGuard and give out a &mut to that data. Then, once we drop the guard, we write that data back into the map. (Mutating only makes writers wait, but allows reader to read the data that was in the BinEntry::Node)

@domenicquirl

This seems similar to the Entry API

I'll look into that

jhinch commented 4 years ago

The WriteGuard would have to: store the cloned Node / V, mutably (deref to it) and allow mutating it.

Currently V does not require Clone, only K is Clone. Would this be desirable? Part of the Entry API I proposed included an and_modify method which uses uses a Fn(&V) -> V function to modify the data. I considered making it Fn(&mut V) and cloning internally in the method, but it seemed better from a Rust perspective to avoid the additional trait bound and allow the consumer of the flurry hashmap to determine if it is appropriate to .clone() or use other means to construct the new value when modifying.

soruh commented 4 years ago

Hmm, you're right, requiring V: Clone seems unfortunate. Maybe we can apply the idea I had of using the BinEntry variant as a kind of lock to the entry API, so that it does not have to lock the whole bin...

jhinch commented 4 years ago

Yep. I like the idea of being able to only lock the single bin entry for the Entry API. In order to get the entry API working similar to Java implementation I was going to introduce BinEntry::Reservation(Mutex<()>) but using Condvar is a better option than a Mutex and I like that you can then use this for mutating existing items as well so I'm thinking that going with your Mutating variant might be a good idea for the Entry API. I haven't had the chance to properly study the Java code and understand why it holds a lock on the entire bin in the first place. We would need to be careful that we don't introduce subtle races into the code.