servo / rust-smallvec

"Small vector" optimization for Rust: store up to a small number of items on the stack
Apache License 2.0
1.34k stars 145 forks source link

RFC: rewrite based on min_const_generics for 2.0 #240

Open SimonSapin opened 3 years ago

SimonSapin commented 3 years ago

There is now a serious proposal and plan to stabilize #![feature(min_const_generics)] in Rust 1.50: https://github.com/rust-lang/rust/pull/79135. Using it here and removing the Array trait would a welcome simplification but obviously a breaking API change, so it would need to be part of a version 2.0 of the crate.

I’m wondering whether it would make sense to try a complete rewrite of smallvec instead of adapting the existing code. Some ideas:

I’d consider not necessarily including everything in https://github.com/servo/rust-smallvec/issues/183, as specialization and custom allocators are likely not gonna be stable before or soon after min_const_generics is. Adding those might not require breaking API changes, though?

I’ve started playing with this a little. Here is the core data structure:

pub struct SmallVec<Item, const INLINE_CAPACITY: usize> {
    // Safety invariants:
    // * The active union field of `storage` must be consistent with the tag of `tagged_len`.
    // * The length in `tagged_len` must not be greater than storage capacity
    // * The first `length` items of the storage must be initialized.
    tagged_len: TaggedLen,
    storage: Storage<MaybeUninit<Item>, INLINE_CAPACITY>,
}

/// Upper bit: 0 for inline storage, 1 for heap storage
/// The rest of the bits: SmallVec length = number of items initialized, at the start of storage.
pub(crate) struct TaggedLen(usize);

/// An array, either inline (of statically-fixed size) or heap-allocated (of dynamic size).
/// The tag of [`SmallVec::tagged_len`] indicates which.
/// The size of this array is the `SmallVec`’s capacity.
union Storage<T, const INLINE_SIZE: usize> {
    inline: ManuallyDrop<[T; INLINE_SIZE]>,
    heap: ManuallyDrop<Box<[T]>>,
}

The union and tagged integer wrangling as well as heap (re/de)allocations could be in a private module, with the rest of the functionality (push, drain, try_reserve, …) built on top of an abstraction such as:

impl<Item, const INLINE_CAPACITY: usize> SmallVec<Item, INLINE_CAPACITY> {
    pub fn new() -> Self {…}
    pub fn with_capacity(capacity: usize) -> Self {…}
    pub fn len(&self) -> usize {…}
    pub unsafe fn set_len(&mut self, new_len: usize) {…}
    pub(crate) fn storage(&self) -> &[MaybeUninit<Item>] {…}
    pub(crate) fn storage_mut(&mut self) -> &mut [MaybeUninit<Item>] {…}
    pub(crate) fn grow_to(&mut self, new_capacity: usize) -> Result<(), TryReserveError> {…}
    pub(crate) fn shrink_to(&mut self, new_capacity: usize) -> Result<(), TryReserveError> {…}
}
mbrubeck commented 3 years ago

I like this plan.

fleabitdev commented 3 years ago

I looked into some possible SmallVec optimisations yesterday. They weren't fruitful, but I thought I'd describe my findings here in case you find them useful for the 2.0 rewrite.

I was motivated by a few comments here, which described a large branch prediction penalty for spilled SmallVecs. I'd seen similar complaints before, so I wanted to investigate possible fixes.

Some data here suggests large performance improvements when using a (32-bit) cmov in place of an unpredictable branch, with minimal or zero performance loss when the branch is completely predictable.

It occurred to me that, with the SmallVec implementation you described above, the length is unchanged whether or not the array is spilled. SmallVec::deref is only branching on which value to assign to a single pointer, which is a good candidate for use of cmov.

I experimented with code which reduced all branching in SmallVec::deref to a trivial let p: *const T = if a { b } else { c }. Unfortunately, I noticed that LLVM completely refused to emit a cmov instruction when it could potentially cause an unnecessary memory access. It turns out this is a known LLVM bug.

Unless you're willing to use inline assembly behind a feature flag, this idea is probably at a dead end. You could experiment with bit-masking as a replacement for cmov, but I think that would cost at least three CPU cycles (xor, and, xor), while a mispredicted branch might cost ten or less.

TheButlah commented 2 years ago

its been about a year now - any movement here? min_const_generics is fully stable

SimonSapin commented 2 years ago

@TheButlah Nope, after the burst of inspiration that led to writing up this plan I ended up spending exactly zero additional time on it. Time since is irrelevant. I still like this plan but I can’t say when or if I’ll work on it. Anyone should feel free to take over.

SimonSapin commented 2 years ago

I ended up spending exactly zero additional time on it

Wait, that’s not quite true. It looks like I did write some proof-of-concept code the same day as this issue, and got bored at drawing the rest of the owl. It’s not even in a git repository. Here it in case anyone wants to take a look