rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97k stars 12.54k forks source link

`RawVec` sets capacity incorrectly, causing potentially many bugs #116580

Open jmaargh opened 11 months ago

jmaargh commented 11 months ago

There are two "capacity"s for any successful allocation: the requested capacity and the returned capacity. The Allocator trait explicitly allows these to not be the same. However, it appears that RawVec assumes they are the same.

Everywhere RawVec request an allocation, it then sets its internal cap field for the capacity to be equal to the requested capacity, not the returned capacity The canonical example of this is here, when a new RawVec is constructed with a given capacity (though the same behaviour also applies when growing allocation). The comment even states

// Allocators currently return a NonNull<[u8]> whose length // matches the size requested. If that ever changes, the capacity // here should change to ptr.len() / mem::size_of::<T>().

which directly contradicts the docs for the Allocator trait.

In principle, we could design RawVec to only store and report the requested capacity and that could be an entirely sound design. However, I worry that the assumption embodied by this comment has wormed its way throughout the standard library and probably the ecosystem. For example, if breaking apart a Vec-based structure to send for FFI, the docs until recently strongly suggested that Vec::capacity would return the allocated capacity (which is only true if the allocator always returns the requested capacity). In #99790 the docs are currently being updated to reflect that this isn't the case.

Another example of where this is problematic is Vec::to_raw_parts which clearly assumes that RawVec::cap is the allocator returned capacity. I worry that this assumption is implicitly present through a lot of code.

I suggest therefore

  1. Fix RawVecs allocation methods to store the capacity returned from the allocator, unfortunately requiring an extra division to do so
  2. Review the documentation of RawVec and structures that use it to ensure that it is clear that capacity() may return more than what was requested, and associated methods
  3. Looking forward: discuss whether we should guarantee that capacity() returns exactly the allocator returned capacity, and for which datastructures.

The alternative that I see is to instead commit to RawVec::cap being the requested capacity. In which case, it would be good to review where this may have been assumed elsewhere and what might need to be fixed. Happy to hear views on whether people think this would be a better path.

the8472 commented 11 months ago

I don't quite understand what the problem is. Either value is fine for rawvec to use because ultimately that's what's passed back to the allocator and the allocator API only requires the value to fit the allocation, which both values do.

asquared31415 commented 11 months ago

Given that you must use the same allocator to do any operations on an allocation and that the allocator must accept the value that you requested for any deallocation or similar APIs, I don't think this is possible to cause anything worse than maybe some unnecessary reallocation where it might not need to happen. (And even then only under "weird" allocators/platforms)

jmaargh commented 11 months ago

These are good points, and on reflection I'm struggling to come up with many concrete bugs that could arise from this discrepancy (which doesn't mean they don't exist).

Intuitively, the most likely place to actually need a precise number for the allocated size is when building around custom allocators (e.g., perhaps you're wrapping a third-party allocator whose deallocate does require that the given size is exact---and you've also found a situation where RawVec::cap is where you want to track that---but I couldn't find any examples of such an allocator with a quick look).

If we can extend this "I can't think of examples where this would cause a bug" can be extended to a proof of "every time returned capacity is needed, allocated capacity is equally correct---including in all FFI scenarios" then I suppose this boils down to:

  1. Remove the comment here. It's wrong and the fact that it was wrong and looked like a bad thing to be wrong about led me to this issue.
  2. Fix the docs for Vec::into_raw_parts* (and any others that might have the same issue), which currently claims to return the "allocated capacity", when it actually only has the same guarantees as Vec::capacity