Open domenicquirl opened 4 years ago
Another thing to look into is if we an make fewer calls to defer_destroy
by batching up all the garbage any given method call accumulates, and then freeing all of it in one defer
closure instead.
seize
. It would be possible to delay allocating some of the memory until a value is actually retired, but that then means that retire
has to allocate, and there are is some extra pointer indirection for every thread that decrements the reference count of a retired batch. The final thread will also have to go through the pointer of every value in the batch. This should reduce memory usage in general, but will likely have a significant performance impact for delete heavy workloads.I ran some tests with Linked<T>
removed all together and saw pretty significant performance boosts (~%30) in read heavy workloads, with diminishing returns as thread counts rose, eventually meeting at $NUM_CPUS
. In a mixed insert/delete workload there was a constant overhead of ~20% for the extra alloc/dealloc when retiring. This could likely be amortized by reusing list nodes.
All right, so here is what I am thinking.
Batch reuse would be optional. If a read-heavy program cares more about memory usage than allocation overhead, it can opt-out. Retirement will then always involve allocating a node, and reclamation will involve an extra dealloc.
Another option would be for the thread that reclaims a batch to first try to take it for itself, before returning it to the collector. However, I'm not sure this optimization is worthwhile as reader threads may end up taking batches that they never end up using, and writer threads are likely to already have taken a batch.
Using this strategy, Linked<T>
will only need to reserve an extra usize
to record the current epoch value, as opposed to storing the node inline. I would like to make it possible to optionally create nodes on link (as is the current behavior), but I'm not sure this would be possible without some funky generics, and the scheme outlined above seems pretty good :)
In the discussion about performance improvements (#50), a point that arose repeatedly is the impact of the
crossbeam_epoch
-based garbage collection we use to track map entries (nodes and values).There are several angles from which we might look at reducing this impact. This issue is meant to be for discussion and changes around how and when our code interacts with garbage collection. Ideas floated in https://github.com/jonhoo/flurry/pull/72#pullrequestreview-378586231 and previous comments may serve as a starting point. They include:
crossbeam_epoch
, e.g. in the form ofdefer_destroy
, out of critical sections in favour of a simpler collection mechanism for things to be deleted. The collected entities could then be handed of to garbage collection after dropping the lock for the critical section. It is yet unclear whether this would yield a significant improvement over callingdefer_destroy
directly from within the critical section (as happens currently), as crossbeam essentially also stores a closure for the drop and takes some measures to optimize the storage of these closures (for example, a deferred closure only requires a heap allocation if it exceeds a certain size).defer_destroy
and its usage itself. See again https://github.com/jonhoo/flurry/pull/72#pullrequestreview-378586231