oxc-project / backlog

backlog for collborators only
1 stars 0 forks source link

Reduce size of `Vec` #18

Open overlookmotel opened 5 months ago

overlookmotel commented 5 months ago

Vec is currently 32 bytes. It consists of 4 x 8-bytes fields:

Reduce to 24 bytes

Make length and capacity fields u32.

This is pretty easy.

Reduce to 16 bytes

If arena allocator always created chunks aligned on 4 GiB boundaries, and stored chunk metadata in a header at start of chunks, the pointer field could do double-duty.

Pointer field would be a pointer to the Vec's contents. But also if you zero out the bottom 32 bits it would give you a pointer to the chunk metadata, with which you can grow/reallocate the Vec etc.

This would require replacing bumpalo with a custom allocator.

overlookmotel commented 3 months ago

It's also possible to get Vec down to 16 bytes without aligning all allocator chunks on 4 GiB, if can accept:

As follows:

struct AllocatorChunkFooter {
    allocator: *const Allocator,
    /* other fields */
}

struct Vec<T> {
    ptr: NonNull<T>,
    len: u32,
    capacity: u32,
}

const CHUNK_ALIGNMENT_BITS: u32 = 5;
const CAPACITY_MASK: u32 = u32::MAX >> CHUNK_ALIGNMENT_BITS;
const CHUNK_ALIGNMENT_SHIFT: u32 = 32 - CHUNK_ALIGNMENT_BITS;
const ALLOCATOR_FIELD_MASK: usize = usize::MAX - std::mem::size_of::<AllocatorChunkFooter>() + 1
    + std::mem::offset_of!(AllocatorChunkFooter, allocator);

impl<T> Vec<T> {
    pub fn len(&self) -> u32 { self.len }

    pub fn capacity(&self) -> u32 { self.capacity & CAPACITY_MASK }

    pub fn push(&mut self, value: T) {
        if self.len() == self.capacity() { self.grow_for_push(); }
        self.ptr.as_ptr().add(len).write(value);
        self.len += 1;
    }

    #[cold]
    fn grow_for_push(&mut self) {
        let ptr = self.ptr.as_ptr() as *const u8;
        let power = (self.capacity >> CHUNK_ALIGNMENT_SHIFT) as usize;
        let chunk_end_minus_one_addr = ptr as usize | ((1 << power) - 1);
        let allocator_ptr_addr = chunk_end_minus_one_addr & ALLOCATOR_FIELD_MASK;
        let allocator_ptr = ptr.add(allocator_ptr_addr - ptr as usize) as *const Allocator;
        let allocator: &Allocator = &*allocator_ptr;
        // Reallocate `Vec` using `allocator`
     }
}

Notes:

overlookmotel commented 3 months ago

Alternatively, could record power in bottom 3 bits of ptr (if all allocations are aligned at least on 8).

This would allow 8 sizes of allocator chunks: 16 MiB, 32 MiB, 64 MiB, 128 MiB, 256 MiB, 512 MiB, 1 GiB, 2 GiB.

That's probably sufficient. However, I think using bits in capacity is preferable, as capacity is less frequently accessed than ptr, which is read on every read/write of the Vec's elements. Clearing bottom 3 bits is only a single bitwise AND operation, but it's an extremely hot path.

overlookmotel commented 2 months ago

If using bottom 3 bits of ptr to store power, could also multiply the stored power by 2, which would give a wider range of chunk sizes:

128KiB, 512 KiB, 2 MiB, 8 MiB, 32 MiB, 128 MiB, 512 MiB, 2 GiB

I still think it's better to use spare bits in capacity, but advantage of packing it into ptr is that then Box could also contain an implicit reference to its Allocator while remaining 8 bytes. This would allow us to make Box Clone, which in turn means all AST types could also be Clone.

There is also the option to use high bits of ptr. But these bits can possibly be put to better use: #91

Boshen commented 2 months ago

Reduce to 24 bytes Make length and capacity fields u32.

I forked https://github.com/oxc-project/oxc-bumpalo, changing usize to u32 is hard than it seems :-/

overlookmotel commented 1 month ago

Yes, I realised we'd need to combine all the fields into a single struct:

// 24 bytes
struct ArenaVec<'a, T> {
    ptr: NonNull<T>,
    len: u32,
    capacity: u32,
    allocator: &'a Allocator,
}

Splitting it into Vec and RawVec (the way std, bumpalo and allocator-api2 do) won't work, because RawVec {ptr: NonNull<T>, cap: u32, allocator: &'a Allocator} is 24 bytes (inc 4 bytes padding). We need to fit len into those spare 4 bytes, but you can't do that with nested structs.

I don't think combining all the methods into 1 struct is so terrible - it just means moving a lot of the code about, rather than a fundamental change. And we can still break up the implementation into multiple files to make it manageable.

But yeah, it's not trivial.

rzvxa commented 1 month ago

Related to this, Bumpalo is planning to remove the Vec types in favor of std types as soon as the API is stable. So it would be wise to start using our custom vectors if we wish to keep control over our types.

Here I tried to offer to implement a feature-gated Vec32 type https://github.com/fitzgen/bumpalo/issues/256 and found out that they are not too keen on keeping their vector types(Which makes sense, In general, you want to use std::vec if it supports custom allocators)

rzvxa commented 1 month ago
// 24 bytes
struct ArenaVec<'a, T> {
    ptr: NonNull<T>,
    len: u32,
    capacity: u32,
    allocator: &'a Allocator,
}

What about this? So we keep it dry.

// type Numeric = either Num trait or our own trait implemented for both u32 and u64
struct PrivateArenaVecNameHere<'a, T, N: Numeric> {
    ptr: NonNull<T>,
    len: N,
    capacity: N,
    allocator: &'a Allocator,
}

pub struct ArenaVec<'a, T> {
    raw: PrivateArenaVecNameHere<'a, T, usize>,
}

// 24 bytes
#[cfg(or(target_pointer_width = "64", target_pointer_width = "32"))]
pub struct ArenaVec32<'a, T> {
    raw: PrivateArenaVecNameHere<'a, T, u32>,
}

Do we still need usize arena allocated vectors?

Edit:

I've missed the most important part😆We still have to copy methods this way, Unless we do it via a trait that comes with default implementations. Now I hope we don't need both types😅

Edit 2:

We still can do this though I'm not sure if it would be nice to use or not, RA tends to ignore type aliases and treat them as the actual type.

// type Numeric = either Num trait or our own trait implemented for both u32 and u64
pub struct ArenaVec<'a, T, N: Numeric> {
    ptr: NonNull<T>,
    len: N,
    capacity: N,
    allocator: &'a Allocator,
}

pub type Vec<'a, T> = ArenaVec<'a, T, usize>; pub type Vec32<'a, T> = ArenaVec<'a, T, u32>;

overlookmotel commented 1 month ago

I doubt Allocator trait will be stabilized any time soon. It's such a large feature that it has it's own working group. With 72 open issues! https://github.com/rust-lang/wg-allocators/issues

I don't actually think it'll be too hard to copy std/bumpalo/allocator_api2's implementation and chop it about. We can probably skip implementing some of the more obscure methods and iterators to reduce the size of the task.

Also relevant: