greg7mdp / parallel-hashmap

A family of header-only, very fast and memory-friendly hashmap and btree containers.
https://greg7mdp.github.io/parallel-hashmap/
Apache License 2.0
2.47k stars 234 forks source link

Remove the `Upgrade` lock stuff. #221

Closed greg7mdp closed 9 months ago

greg7mdp commented 9 months ago

Maybe I should remove SharedLock as well since ReadWriteLock is equivalent if we don't call switch_to_unique()

bpmckinnon commented 9 months ago

Maybe I should remove SharedLock as well since ReadWriteLock is equivalent if we don't call switch_to_unique()

I would leave it. There can be a difference between Shared and ReadWrite. In my case, the ReadWrite lock puts the mutex into a still readable state, but prevents new writers from coming in, so that I can upgrade to a write state without releasing the lock.

greg7mdp commented 9 months ago

Maybe I should remove SharedLock as well since ReadWriteLock is equivalent if we don't call switch_to_unique()

I would leave it. There can be a difference between Shared and ReadWrite. In my case, the ReadWrite lock puts the mutex into a still readable state, but prevents new writers from coming in, so that I can upgrade to a write state without releasing the lock.

Hum, I don't follow. The ReadWriteLock always releases the lock before upgrading. Also both Shared and ReadWrite locks will prevent new writers from coming in.

bpmckinnon commented 9 months ago

Maybe I should remove SharedLock as well since ReadWriteLock is equivalent if we don't call switch_to_unique()

I would leave it. There can be a difference between Shared and ReadWrite. In my case, the ReadWrite lock puts the mutex into a still readable state, but prevents new writers from coming in, so that I can upgrade to a write state without releasing the lock.

Hum, I don't follow. The ReadWriteLock always releases the lock before upgrading. Also both Shared and ReadWrite locks will prevent new writers from coming in.

I use something similar to this https://www.boost.org/doc/libs/1_56_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_concepts.upgrade_lockable

So the lock is tri state. It's read, write, or upgradable. When the lock is locked as upgradable, readers can come and go, but writer must queue up behind the upgradable. The thread with the upgradable lock can switch to a write lock without releasing the lock. It waits for all readers to exit, then changes to a write lock, but since it was at the start of the write queue it doesn't need check for changes. So in my case, I have a useful switch_to_unique function that returns false.

greg7mdp commented 9 months ago

I see, so you updated the ReadWriteLock's lock_shared() to lock in upgradable mode?

bpmckinnon commented 9 months ago

I see, so you updated the ReadWriteLock's lock_shared() to lock in upgradable mode?

Exactly. I use the default implementation of ReadLock and WriteLock, but I've made my own custom ReadWriteLock.

bpmckinnon commented 9 months ago

Change looks good to me.

greg7mdp commented 9 months ago

When the lock is locked as upgradable, readers can come and go

I was looking at the boost doc, and only one thread at a time can acquire an upgradable lock. So I guess it helps if most threads are doing just lookups and one thread mutates the hashmap, but if many threads erase() for example it may not be better than just write locks.

bpmckinnon commented 9 months ago

Yeah, I spent a couple weeks working on upgradable locks and it was always a mixed bag based on workloads, but in most practical applications it didn't really help. In heavily multi-threaded environments contention and the resulting cache invalidation on the atomics became extremely high, especially if the workload done in the callbacks was small enough that the time was dominated by the extra atomic lock. The key benefit of the upgradable lock is that it does a single atomic operation to move from upgrade to write, vs two with read to unlock to write with an extra check to validate that nothing changes the state in between. Cranking up the number bins in the parallel map helped, but it needed to be 4 to 8 times larger than the expected thread count. Due to contention I've pulled back on the scale of multithreading I was doing, but the code is still thread-safe, so I can try and ramp up the thread counts to see what impact it has.