Open ibraheemdev opened 8 months ago
I experimented a little bit with this in the pool
branch, along with a try_steal
method that enables reusing pointers that have not been deallocated. Unfortunately, I'm not seeing much benefit without aggressive allocation reuse, and it does make the fast path for dropping a guard more expensive.
https://dl.acm.org/doi/pdf/10.1145/3627535.3638491 suggests that batch freeing bypasses thread-local allocator buffers, and freeing from a remote thread, which is extremely expensive (note that mimalloc avoids this problems, but about every other allocator is affected). Amortized freeing can improve both latency as well as throughput.