A lock-free concurrent slab.
Slabs provide pre-allocated storage for many instances of a single data type. When a large number of values of a single type are required, this can be more efficient than allocating each item individually. Since the allocated items are the same size, memory fragmentation is reduced, and creating and removing new items can be very cheap.
This crate implements a lock-free concurrent slab, indexed by usize
s.
Note: This crate is currently experimental. Please feel free to use it in your projects, but bear in mind that there's still plenty of room for optimization, and there may still be some lurking bugs.
First, add this to your Cargo.toml
:
sharded-slab = "0.1.7"
This crate provides two types, Slab
and Pool
, which provide slightly
different APIs for using a sharded slab.
Slab
implements a slab for storing small types, sharing them between
threads, and accessing them by index. New entries are allocated by inserting
data, moving it in by value. Similarly, entries may be deallocated by taking
from the slab, moving the value out. This API is similar to a Vec<Option<T>>
,
but allowing lock-free concurrent insertion and removal.
In contrast, the Pool
type provides an object pool style API for
reusing storage. Rather than constructing values and moving them into
the pool, as with Slab
, allocating an entry from the pool
takes a closure that's provided with a mutable reference to initialize
the entry in place. When entries are deallocated, they are cleared in
place. Types which own a heap allocation can be cleared by dropping any
data they store, but retaining any previously-allocated capacity. This
means that a Pool
may be used to reuse a set of existing heap
allocations, reducing allocator load.
Inserting an item into the slab, returning an index:
use sharded_slab::Slab;
let slab = Slab::new();
let key = slab.insert("hello world").unwrap();
assert_eq!(slab.get(key).unwrap(), "hello world");
To share a slab across threads, it may be wrapped in an Arc
:
use sharded_slab::Slab;
use std::sync::Arc;
let slab = Arc::new(Slab::new());
let slab2 = slab.clone();
let thread2 = std::thread::spawn(move || {
let key = slab2.insert("hello from thread two").unwrap();
assert_eq!(slab2.get(key).unwrap(), "hello from thread two");
key
});
let key1 = slab.insert("hello from thread one").unwrap();
assert_eq!(slab.get(key1).unwrap(), "hello from thread one");
// Wait for thread 2 to complete.
let key2 = thread2.join().unwrap();
// The item inserted by thread 2 remains in the slab.
assert_eq!(slab.get(key2).unwrap(), "hello from thread two");
If items in the slab must be mutated, a Mutex
or RwLock
may be used for
each item, providing granular locking of items rather than of the slab:
use sharded_slab::Slab;
use std::sync::{Arc, Mutex};
let slab = Arc::new(Slab::new());
let key = slab.insert(Mutex::new(String::from("hello world"))).unwrap();
let slab2 = slab.clone();
let thread2 = std::thread::spawn(move || {
let hello = slab2.get(key).expect("item missing");
let mut hello = hello.lock().expect("mutex poisoned");
*hello = String::from("hello everyone!");
});
thread2.join().unwrap();
let hello = slab.get(key).expect("item missing");
let mut hello = hello.lock().expect("mutex poisoned");
assert_eq!(hello.as_str(), "hello everyone!");
slab
: Carl Lerche's slab
crate provides a slab implementation with a
similar API, implemented by storing all data in a single vector.
Unlike sharded-slab
, inserting and removing elements from the slab requires
mutable access. This means that if the slab is accessed concurrently by
multiple threads, it is necessary for it to be protected by a Mutex
or
RwLock
. Items may not be inserted or removed (or accessed, if a Mutex
is
used) concurrently, even when they are unrelated. In many cases, the lock can
become a significant bottleneck. On the other hand, sharded-slab
allows
separate indices in the slab to be accessed, inserted, and removed
concurrently without requiring a global lock. Therefore, when the slab is
shared across multiple threads, this crate offers significantly better
performance than slab
.
However, the lock free slab introduces some additional constant-factor
overhead. This means that in use-cases where a slab is not shared by
multiple threads and locking is not required, sharded-slab
will likely
offer slightly worse performance.
In summary: sharded-slab
offers significantly improved performance in
concurrent use-cases, while slab
should be preferred in single-threaded
use-cases.
Most implementations of lock-free data structures in Rust require some
amount of unsafe code, and this crate is not an exception. In order to catch
potential bugs in this unsafe code, we make use of loom
, a
permutation-testing tool for concurrent Rust programs. All unsafe
blocks
this crate occur in accesses to loom
UnsafeCell
s. This means that when
those accesses occur in this crate's tests, loom
will assert that they are
valid under the C11 memory model across multiple permutations of concurrent
executions of those tests.
In order to guard against the ABA problem, this crate makes use of
generational indices. Each slot in the slab tracks a generation counter
which is incremented every time a value is inserted into that slot, and the
indices returned by Slab::insert
include the generation of the slot when
the value was inserted, packed into the high-order bits of the index. This
ensures that if a value is inserted, removed, and a new value is inserted
into the same slot in the slab, the key returned by the first call to
insert
will not map to the new value.
Since a fixed number of bits are set aside to use for storing the generation counter, the counter will wrap around after being incremented a number of times. To avoid situations where a returned index lives long enough to see the generation counter wrap around to the same value, it is good to be fairly generous when configuring the allocation of index bits.
These graphs were produced by benchmarks of the sharded slab implementation,
using the criterion
crate.
The first shows the results of a benchmark where an increasing number of
items are inserted and then removed into a slab concurrently by five
threads. It compares the performance of the sharded slab implementation
with a RwLock<slab::Slab>
:
The second graph shows the results of a benchmark where an increasing
number of items are inserted and then removed by a single thread. It
compares the performance of the sharded slab implementation with an
RwLock<slab::Slab>
and a mut slab::Slab
.
These benchmarks demonstrate that, while the sharded approach introduces a small constant-factor overhead, it offers significantly better performance across concurrent accesses.
This project is licensed under the MIT license.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, shall be licensed as MIT, without any additional terms or conditions.