servo / rust-smallvec

"Small vector" optimization for Rust: store up to a small number of items on the stack
Apache License 2.0
1.35k stars 145 forks source link

SmallVec<[T; N]> is often larger than it needs to be #219

Closed thomcc closed 4 years ago

thomcc commented 4 years ago

Currently a usize is used to store the size when inline, but ideally if N < 256 a u8 would be used for the size, and u16 for N < 65536, etc.

l0calh05t commented 4 years ago

This would make sense if SmallVec couldn't grow beyond the bounds of the stack allocation (like arrayvec, for example), but it can and will allocate from the heap if necessary.

thomcc commented 4 years ago

That is somewhat true with the current implementation, but it's certainly not true in general -- you'd just have to move from using a (e.g.) u8 to using a usize when you move from inline to heap allocated.

thomcc commented 4 years ago

The main difference here I guess is with SmallVec<[u8; N]> where N is low. Other cases don't matter quite as much (although might be slightly larger than they would be otherwise), but IME this is a somewhat common case for people who use SmallVec<[u8; N]> as a kind of SmallString.

l0calh05t commented 4 years ago

I still don't quite see how that would work. capacity is used to disambiguate between stack and heap storage and already doubles as the size when not spilled. It would really only work if your maximum capacity should be limited as well.

thomcc commented 4 years ago

You wouldn't be able to use capacity for that. That said I don't think this would cause really any additional branching or anything -- you already have to branch on representation in each method.

l0calh05t commented 4 years ago

But how would you disambiguate instead? Using an enum? Now you need additional storage for the enum's tag and still need to store two usizes for the heap case. And taking into account alignment, the following enum would take up a minimum of 32 bytes (on a machine with 8-byte pointers and usize):

enum Storage {
    Stack(u8, MaybeUninit<[u8; 4]>),
    Heap(*mut u8, usize, usize),
}

So you could only gain something if the array backing storage is larger than 3*size_of::<usize>() and has an alignment smaller than size_of::<usize>() . (Assuming size_of = align_of for primitives, which is true on most machines)

moshg commented 4 years ago

Though I don't know if this can be implemented, we can disambiguate by most significant bit of the pointer, because pointers don't overflow isize.

pickfire commented 4 years ago

Could we do something like what https://github.com/maciejhirsz/beef did?

thomcc commented 4 years ago

Hmm, on further investigation a lot of this is actually fixed (in part, anyway) by the union feature, since as it is, we have a bunch of space that comes from the enum discriminant.

Or, it's still larger than it needs to be in some cases, but doesn't get better by doing what I suggested (I suspect if/when rustc gets more aggressive niche-filling optimizations that might change, but obviously this could be revisited if that ever happens and it makes a difference).

That said, I'm still not sure why this uses the size to differentiate than storing everything in the enum, which I think would eliminate the extra size overhead on stable (or am I mistaken again...), but I guess the preference (mentioned in #183) is to wait for unions to stabilize.

thomcc commented 4 years ago

Regarding beef, a quick look says that it's doing things which limit size/capacity (on the lean types) to u32, which is a tradeoff that might be undesirable. (I mean there probably exists some user of smallvec who cares about this, and it's not something I'd be willing to argue should change...)

There might be more too -- it mentions it only being 64-bit compatible and there are a lot of tricks you can do by taking advantage of the somewhat peculiar way that the address space is laid out on x86_64 and aarch64 ... I'm not sure this sort of thing is really worth the hassle for smallvec though. My gut would be to say "no not really".


Anyway I'm just going to close it since I filed it due to a misunderstanding. I still wish SmallVec was, err, smaller though >_>