Closed thomcc closed 4 years ago
This would make sense if SmallVec
couldn't grow beyond the bounds of the stack allocation (like arrayvec
, for example), but it can and will allocate from the heap if necessary.
That is somewhat true with the current implementation, but it's certainly not true in general -- you'd just have to move from using a (e.g.) u8
to using a usize
when you move from inline to heap allocated.
The main difference here I guess is with SmallVec<[u8; N]> where N is low. Other cases don't matter quite as much (although might be slightly larger than they would be otherwise), but IME this is a somewhat common case for people who use SmallVec<[u8; N]> as a kind of SmallString.
I still don't quite see how that would work. capacity
is used to disambiguate between stack and heap storage and already doubles as the size when not spilled. It would really only work if your maximum capacity should be limited as well.
You wouldn't be able to use capacity for that. That said I don't think this would cause really any additional branching or anything -- you already have to branch on representation in each method.
But how would you disambiguate instead? Using an enum
? Now you need additional storage for the enum
's tag and still need to store two usize
s for the heap case. And taking into account alignment, the following enum
would take up a minimum of 32 bytes (on a machine with 8-byte pointers and usize
):
enum Storage {
Stack(u8, MaybeUninit<[u8; 4]>),
Heap(*mut u8, usize, usize),
}
So you could only gain something if the array backing storage is larger than 3*size_of::<usize>()
and has an alignment smaller than size_of::<usize>()
. (Assuming size_of
= align_of
for primitives, which is true on most machines)
Though I don't know if this can be implemented, we can disambiguate by most significant bit of the pointer, because pointers don't overflow isize
.
Could we do something like what https://github.com/maciejhirsz/beef did?
Hmm, on further investigation a lot of this is actually fixed (in part, anyway) by the union
feature, since as it is, we have a bunch of space that comes from the enum discriminant.
Or, it's still larger than it needs to be in some cases, but doesn't get better by doing what I suggested (I suspect if/when rustc gets more aggressive niche-filling optimizations that might change, but obviously this could be revisited if that ever happens and it makes a difference).
That said, I'm still not sure why this uses the size to differentiate than storing everything in the enum, which I think would eliminate the extra size overhead on stable (or am I mistaken again...), but I guess the preference (mentioned in #183) is to wait for unions to stabilize.
Regarding beef
, a quick look says that it's doing things which limit size/capacity (on the lean
types) to u32, which is a tradeoff that might be undesirable. (I mean there probably exists some user of smallvec who cares about this, and it's not something I'd be willing to argue should change...)
There might be more too -- it mentions it only being 64-bit compatible and there are a lot of tricks you can do by taking advantage of the somewhat peculiar way that the address space is laid out on x86_64 and aarch64 ... I'm not sure this sort of thing is really worth the hassle for smallvec though. My gut would be to say "no not really".
Anyway I'm just going to close it since I filed it due to a misunderstanding. I still wish SmallVec was, err, smaller though >_>
Currently a
usize
is used to store the size when inline, but ideally if N < 256 a u8 would be used for the size, and u16 for N < 65536, etc.