Closed TethysSvensson closed 5 months ago
On my machine, these reallocations do not appear to return the samme allocation, so we might as well allocate up front to avoid a memcpy.
The pointer not being the same does not necessarily mean the allocation hasn't been reused. The allocator can call mremap
to move it to a new virtual address which has a much lower cost than actually copying the bytes or faulting in new pages. This may also depend on the allocation size. For smaller pool-based allocations it might be an actual memmove and a mremap for larger ones.
The behavior regarding capacities in the remaining cases needs to either change or be documented somewhere.
Well, this is a non-guaranteed optimization so we can't document the exact behavior. But it probably makes sense to mention that something like this may happen.
Additionally, in some this will cause the destination vector to have a very large capacity even for non-shared allocations. In my opinion, this should never happen.
That's intended to allow vec-allocation recycling even when the type changes, which can mean the excess factor is infinite.
On my machine this does result in a memmove. This is the strace output of an execution:
29839 mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f50437bd000
29839 write(1, "vec data ptr=0x7f50437bd010 len="..., 44) = 44
29839 mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f504377c000
29839 munmap(0x7f50437bd000, 266240) = 0
29839 write(1, "vec data ptr=0x7f504377c010 len="..., 47) = 47
29839 munmap(0x7f504377c000, 266240) = 0
This extra capacity also has negative performance impacts:
fn test_func() {
let v1 = (0u16..0x4000).map(|i| [i; 32]).collect::<Vec<_>>();
let v2 = v1.into_iter().map(|x| x[0] as u8).collect::<Vec<_>>();
// Prevent the optimizer from removing the allocation
assert_eq!(v2[0], 0);
}
fn main() {
let mut now = std::time::Instant::now();
for _ in 0..10_000 {
test_func();
}
println!("{}", now.elapsed().as_millis());
}
This code takes more than 10 times as long on my machine if it was compiled with the beta compiler rather than the stable one.
It seems that much like capacity growth has a scaling factor there should be some sort of limit on the unused capacity that will be used for optimizations like this. Maybe there should be a general rule for Vec
such as "The capacity will never automatically grow more than 2x (or some other number) the maximum elements that have been part of the Vec
since the last explicit capacity operation."
So the manual capacity operations still work, but automatic capacity growth and allocation will not use more than some constant factor of the in-use memory.
It seems obvious to me that some rule is required here. If the next version of rust made push
allocate a thousand times extra capacity it would "break" many programs. So there is an implicit contract here. It may make sense to make it explicit. IDK what the number we should promise is, maybe 2x or 4x. But it would be nice to write down some guarantee that can be relied on.
If we have that rule spelt out then we can decide if this operation should follow the rule, or have some sort of exception. But I think it should probably follow it. (Or maybe have a slightly relaxed threshold.)
It seems obvious to me that some rule is required here.
Not necessarily. The capacity has already been allocated, so the cost has been paid. And vecs never shrink by themselves, even if you pop
from them. So there's no guarantee that you always get minimal possible memory footprint.
I agree that this is a bit of a weird case, but I would argue that from the user's perspective this is a "new Vec". So it is somewhat surprising that it is holding on to arbitrary amounts of memory from another vector. It would be nice to provide some guarantees.
even if you pop from them. So there's no guarantee that you always get minimal possible memory footprint.
Please not that in my rule I said "maximum elements since ...". This would allow the current behaviour of pop
not freeing memory. In other words my rule only controls growth, it never implies that the Vec
will shrink other than explicit capacity operations such as shrink_to_fit
.
It seems obvious to me that some rule is required here.
Not necessarily. The capacity has already been allocated, so the cost has been paid. And vecs never shrink by themselves, even if you
pop
from them. So there's no guarantee that you always get minimal possible memory footprint.
The cost of an allocation is over it's entire lifetime: It's memory cannot be used by something else.
I pretty much agree with @kevincox and was about to post something similar: There should be a note in the docs for .collect<Vec<_>() that states that the resulting capacity can never be more than twice the length. (Or wherever in the docs that would cover any other similar cases)
"surprising" is a matter of expectations. Different people have different expectations. E.g. some people have expressed surprise when collect doesn't reuse allocations.
And as I have mentioned in a previous comment using a factor-based heuristic doesn't help if you're intentionally doing something like vec_t.filter_map(|_| None).collect::<Vec<U>>
to keep the allocation. Allocation recycling has been requested before.
There should be a note in the docs for .collect<Vec<_>() that states that the resulting capacity can never be more than twice the length.
Vec does not guarantee that anywhere for any operation. Quite the opposite:
Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized push.
Small note: I think that collect::<Vec<_>>
should probably have some sort of exception for size_hint
. For example if size_hint
says a minimum size of 1k elements but then ends after 2 I think it is acceptable that the capacity can be 500x the actual used space. But I would document this as an exception to the general rule since this it only occurs if the implementation of size_hint
is broken.
If your size hint is incorrect we could just panic. You're breaking an API contract. It's called "hint" because it's not required to be tight, not because you're allowed to return garbage.
I don't think anyone is claiming that there is currently a guarantee. Personally I am claiming that there should be a guarantee because users are relying on some sort of reasonable memory usage, and it would be nice to actually promise that.
if you're intentionally doing something like vec_t.filtermap(|| None).collect::<Vec> to keep the allocation.
Maybe this would best be provided by a dedicated API rather than this weird looking trick? This would make it explicit and obvious to readers.
Well, there's comments here and there telling people to use just that if they want the recycling behavior... so you could say people also rely on that behavior and have different expectations.
Looking at the assembly for the following two functions
pub fn test_array_u8(input: Vec<[u8; 8]>) -> Vec<u8> {
input.into_iter().map(|x| x[0] as u8).collect()
}
pub fn test_array_u16(input: Vec<[u16; 4]>) -> Vec<u8> {
input.into_iter().map(|x| x[0] as u8).collect()
}
It seems to me that on beta, the first one (where the element type matches the result type), works completely in place without allocations. So here the resulting capacity being bigger would be expected.
The second function also first calculates the result in place, but then seems to do an allocation and copying of the data. So there might be a bug or missed optimization in the size calculation for this new allocation.
With rust 1.75, both look like they first allocate a new location for the result.
That's unexpected. It should be calling into realloc, not memcpy
The pointer not being the same does not necessarily mean the allocation hasn't been reused. The allocator can call
mremap
to move it to a new virtual address which has a much lower cost than actually copying the bytes or faulting in new pages. This may also depend on the allocation size. For smaller pool-based allocations it might be an actual memmove and a mremap for larger ones.
This is not really relevant, since it's impossible to reuse an allocation with a different alignment in stable Rust. <System as Allocator>::shrink()
always falls back to allocate()
+ copy_nonoverlapping()
+ deallocate()
if the alignments aren't exactly the same. Thus, a custom Allocator
implementation would be necessary for shrink()
to even plausibly change the alignment.
Yeah I think I was testing with jemalloc.
I agree with others in this thread. The behavior is quite surprising to me, which is generally a bad thing for a language that advertises itself as reliable. Even more so if you plan on building system abstractions with it. I think at the very least there has to be an upper limit to the reuse. Anything above 2x seems like a footgun to me. As the original post author encountered already.
Also the property that it will re-use for different alignments also seems quite surprising to me.
That's unexpected. It should be calling into realloc, not memcpy
Might be the default implementation of the global allocator for shrink
. Since the alignment is different it ends in the default branch and has to allocate/copy/deallocate.
Might be the default implementation of the global allocator for shrink. Since the alignment is different it ends in the default branch and has to allocate/copy/deallocate.
Which seems unnecessary if the target type is overaligned? I think this should only be necessary when the target type is under/mis-aligned. Which is never true if the target type is smaller than the source type.
Which seems unnecessary if the target type is overaligned? I think this should only be necessary when the target type is under/mis-aligned. Which is never true if the target type is smaller than the source type.
libc allocators generally don't provide an aligned_realloc()
to go alongside posix_memalign()
or aligned_alloc()
. Thus, any allocation with an alignment greater than the guaranteed malloc()
alignment has to be manually reallocated via memcpy()
. This is implemented as the realloc_fallback()
function in library/std/src/sys/common/alloc.rs
.
@the8472
First of all, I'd like to say that the core idea of this optimization is a good one. In general, mapping in place is much less costly than allocating, mapping across, then deallocating the old allocation. Avoiding a new allocation and avoiding the resulting cache traffic is good.
With that being said, just because the idea is good in general does not mean that it may never be unhelpful in some cases, and the case demonstrated in the blog post, where over 100x the necessary memory is retained, is definitely unhelpful.
The documentation could be improved, but:
collect
in the wild, written prior to this optimization, which may suddenly start exhibiting such an unhelpful behavior.So, let's focus on expectations:
Vec
didn't preserve the 100x excess capacity? I'd argue NO. This is way beyond the usual Exponential Growth pattern, and usually only occurs when explicit steps are taken -- reserve, or popping elements out.You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls mremap
.
I'd suggest that in case of too much extreme capacity -- when the new capacity is, say, 4x+ the new length -- the implementation should call shrink_to_fit
. This has several benefits:
realloc
will be able to grow it in place: there was space before, so as long as no other allocation stole it, it's still there.From a number of users affected point of view, I'd expect that saving memory for the majority of users who'd get bitten by excess capacity is worth it. The tiny fraction of users for whom preserving the entire allocation is absolutely necessary can always ask for a dedicated explicit function, or create a crate in the meantime.
I'd suggest that in case of too much extreme capacity -- when the new capacity is, say, 4x+ the new length -- the implementation should call
shrink_to_fit
.
I think this is reasonable. I might even go as far as "more than 2x", since that means we have more capacity than the unspecialized doubling-growth would have produced.
Users don't read the documentation.
This is very poor argument imo. It may be a fact, but it's not something we should bend over backwards to accomodate.
If everyone programs based on vibes then we can just delete the docs. API contracts exist for a reason because people's expectations are not universal and thus won't lead to consistent outcomes. If you code to assumptions you build on a foundation of sand. Sometimes people just assume very silly things. Sometimes they assume reasonable things that happen to be false because they conflict with other reasonable things. In many cases the docs are intentionally vague to preserve implementation flexibility.
There's a lot of collect in the wild, written prior to this optimization, which may suddenly start exhibiting such an unhelpful behavior.
The optimization has existed since 2020. It was recently only extended to a few more cases. I agree that there are some issues with the generalization but the fundamental approach has already worked nicely for a while. Now that it's more broadly applicable someone has encountered an edge-case, ok, that happens. But that doesn't mean just because someone wrote a blog-post that it now must be changed.
That the system allocator doesn't support alignment-changing realloc is an issue worth considering though.
Is it reasonable for a user to complain if the resulting Vec didn't preserve the 100x excess capacity?
That depends on what is excess capacity. If you keep reusing a vec then there is no excess. The capacity will be filled up in the next round. So this is use-dependent.
I'd suggest that in case of too much extreme capacity -- when the new capacity is, say, 4x+ the new length -- the implementation should call shrink_to_fit.
As I have said several times that basing this on a factor (such as 4x) means that clear-and-recycle-allocation uses become impossible with this approach even though we have told people to do it this way.
We can change that of course, but then we're just breaking a different set of users which currently aren't making noises. Just because one set is visible doesn't mean the other set doesn't exist.
I think this is reasonable. I might even go as far as "more than 2x", since that means we have more capacity than the unspecialized doubling-growth would have produced.
I would recommend not tying your hands too tightly here. Julia for example uses a growth strategy that grows at ~4x for small vectors but 1.13x for big vectors. I would suggest wording it as "will not use more than a reasonable constant factor" rather than documenting a strict cap at 2x.
I'd suggest that in case of too much extreme capacity -- when the new capacity is, say, 4x+ the new length -- the implementation should call shrink_to_fit.
As I have said several times that basing this on a factor (such as 4x) means that clear-and-recycle-allocation uses become impossible with this approach even though we have told people to do it this way.
I think the original comment quoted is talking explicitly about .collect()
. I think clear()
is still expected to leave the capacity in place. I think this is most surprising on collect()
because the the caller this is a "new" Vec
. I wouldn't be surprised with some extra capacity (like push
will over-allocate) but using hundreds of times the in-use space seems very surprising.
I think I would agree that after a collect
there should be an upper bound on the capacity in relation to the length. I think that most users who are worried in reusing can use v.clear(); v.extend(i);
or an explicit conversion API if type changes are needed.
You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls
mremap
.
@matthieu-m I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the in_place_collect
implementation itself is a call to Allocator::shrink()
, which per its documentation allows decreasing the required alignment. However, in stable Rust, the only available Allocator
is Global
, which delegates to the registered GlobalAlloc
. Since GlobalAlloc::realloc()
cannot change the required alignment, the implementation of <Global as Allocator>::shrink()
must fall back to creating a brand-new allocation, memcpy
ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same.
Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally mremap()
the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
This could be fixed by letting the global allocator implement the full Allocator
API, instead of restricting it to the GlobalAlloc
API. Then, at least for small alignments (within the guaranteed malloc()
alignment), the platform-dependent system allocator would be able to utilize realloc()
instead of falling back to a manual copy-based implementation. However, this would require a substantial rework of the relationship between the global allocator and the Allocator
trait.
I would suggest wording it as "will not use more than a reasonable constant factor"
I would like to see an explicit factor, even if the standard library currently uses less than it. For example even if we use 2x now I wouldn't mind documenting at 10x. At least this provides an upper bound who people in situations where memory usage is very important can design against.
@kevincox I'm specifically referring to this pattern which I have linked to in this comment
We can change that of course, but then we're just breaking a different set of users which currently aren't making noises. Just because one set is visible doesn't mean the other set doesn't exist.
How would users be broken by not expanding this optimization as far as possible? Creating a new allocation in the different-layout case has been the behavior for every stable version of Rust! The "we have told people to use it this way" only applies to the same-layout case, unless you have another link to share.
Overall, there's an asymmetry here, between people who have lots of existing code who want collect()
not to change its effective behavior by retaining massive unused capacity, and people who wish for a good way to recycle allocations for types with different layouts, regardless of how much spare capacity that creates. That way doesn't necessarily have to be the existing collect()
.
I also don't think we can consider a GitHub comment as a binding promise. This is what we have the stabilization process for. I think we need an RFC process to either guarantee these reuse optimizations and when they apply or an RFC to stabilize an explicit API for this.
I agree and I never said it's a binding promise. But it's imo still a quantum stronger than the no-promises-at-all that currently (don't) exist in the other direction, contrary to people's assumptions.
How would users be broken by not expanding this optimization as far as possible?
They would be broken if we promise a maximum-unused-capacity factor as some are asking for in this discussion.
I think it is worth pointing out here, that there are a few different problems that we are trying to optimize here. Ideally we can find a solution that works in all scenarios, but perhaps a hard tradeoff needs to be made between them.
Problem 1: Most users are happy to have the in-place optimization, because it results in faster code and lower memory usage most of the time. They would be happy if this optimization could be triggered in more cases. However, it breaks the principle of least surprise that it might result in significantly higher memory usage in some scenarios.
Problem 1a: As a consequence of the change in the optimization, we might see a large regression in the memory usage of some applications in the next stable release.
Problem 2: The in-place optimization has been the recommended ideom for reusing allocations for a while, and some users are now depending on it.
Problem 2a: If we introduce a limit on the capacity, these users will see a regression.
Problem 3: When using some allocator+alignment combinations, the in-place+realloc strategy will result in significantly worse performance than the alternative.
Problem 3: When using some allocator+alignment combinations, the in-place+realloc strategy will result in significantly worse performance than the alternative.
I've just rechecked that and @LegionMammal978 is right that it is impossible to benefit from this optimization on stable so it makes sense to revert this part. I'll post a PR.
Problem 3: When using some allocator+alignment combinations, the in-place+realloc strategy will result in significantly worse performance than the alternative.
I've just rechecked that and @LegionMammal978 is right that it is impossible to benefit from this optimization on stable so it makes sense to revert this part. I'll post a PR.
Wouldn't it be possible to work around problem 3 by replacing mem::align_of::<SRC>() < mem::align_of::<DEST>()
with mem::align_of::<SRC>() != mem::align_of::<DEST>()
inside the in_place_collectible
function?
That's more or less it, yes. Plus a few other lines that can be simplified.
How would users be broken by not expanding this optimization as far as possible?
They would be broken if we promise a maximum-unused-capacity factor as some are asking for in this discussion.
This would be covered by @kevincox's actual proposal for the factor:
"The capacity will never automatically grow more than 2x (or some other number) the maximum elements that have been part of the
Vec
since the last explicit capacity operation."
Since .clear()
doesn't change the high-water mark for the element count, it would still be perfectly safe to continue using same-layout .collect()
with such a guarantee.
Though perhaps a more operational and less problematic factor may be, "While collecting an iterator created from a Vec<T>
into a Vec<U>
, if the length (in elements) doesn't grow, then the resulting capacity (in elements) must be no more than N × the original capacity (in elements)," for some N. This allows same-layout .collect()
to preserve the initial capacity exactly, while also avoiding the creation of massive unused capacity if the element size is decreased.
I suppose that's a possible way to do it but it seems kind of weird to me to distinguish between "it's mostly empty because it discarded a lot of items" and "it's mostly empty because it converted to a smaller type". Because you can end up with anywhere between 0% and 100% occupancy with either approach.
let v: Vec<[u8; 128]> = vec![[0u8; 128]; 1024];
let v: Vec<u8> = v.into_iter().flatten().collect();
Changes to a smaller type but the len/capacity ratio stays the same.
That's why I stipulated "if the length (in elements) doesn't grow"; if it does, then obviously it would be fine to give .collect()
more leeway. A more explicit solution for that would be to use "N × the original capacity (in elements) + M × the number of additional elements" as the bound. It doesn't necessarily matter that the resulting occupancy might be low, only that it isn't massively lower than the original occupancy.
Well... you can also write the recycle in a different way without explicit clear
That's why my proposal considers the initial capacity? There could be a million different ways to write the recycle
, and none of them would be broken by the proposal as long as they don't increase the element count.
Oh you're trying to codify the old behavior. But then you basically can't extend allocation recycling to more (smaller) types. My goal was to say "write it this way and we'll make it more powerful over time".
If that's ruled out we might need a try_reuse_allocation
method or something like that.
But then you basically can't extend allocation recycling to more (smaller) types. My goal was to say "write it this way and we'll make it more powerful over time".
The whole problem is that "this way" is also the idiomatic way for collecting any iterator into a Vec
, many existing users of which want to avoid an extremely low-occupancy result, especially when they didn't start with a low-occupancy source.
Another option would be to explicitly bound the output by occupancy, in saying (e.g.) that the output occupancy (as measured with the upper bound of the .size_hint()
) must be at least half the input occupancy. In particular, to reuse the original allocation, the (upper-bound) total byte length of the resulting elements must be at least half the total byte length of the initial elements.
This allows for same-layout recycling of a cleared Vec
(since both lengths are 0), same-layout recycling via filter_map(|_| None)
(since FilterMap
doesn't change the upper bound), flat_map
recycling to preserve the total byte length (since it adjusts the upper bound accordingly, if it reports one at all), and general recycling that keeps the length but reduces the element size by up to half. But at the same time, if the input Vec
has an occupancy of at least 50% (e.g., if it has been built up via .push()
), then the output Vec
won't have an (upper-bound) occupancy lower than 25%, which seems like a reasonable figure to me.
many existing users of which want to avoid an extremely low-occupancy result
I want to contest the "many" here, since most of the time doesn't matter, but... too many words spent on quibbling.
Another option would be to explicitly bound the output by occupancy, in saying (e.g.) that the output occupancy (as measured with the upper bound of the .size_hint()) must be at least half the input occupancy;
Interesting. Finally a use for the upper bound 😄
I think there are still some scenarios where you'd end up shrinking the vector and then afterwards unnecessarily triggering a regrowth by push
ing into it... but if we're relying on realloc
being cheap then the occasional regrowing would also be cheap. So as long as we can avoid unnecessary shrinkage most of the time this might be good enough.
Ok, one further complication: We also want to avoid having a call to realloc sitting in the compiled code at all (even behind a non-taken branch) where possible since this allows some iterators to be reduced to a nop. After all that's how it always was before #110353 and still is in most cases. That's why there's a whole bunch of const
in the optimization, to eliminate this at compiletime. I think this can be mostly based on the EXPAND_BY/MERGE_BY and type size calculation.
I think there are still some scenarios where you'd end up shrinking the vector and then afterwards unnecessarily triggering a regrowth by
push
ing into it... but if we're relying onrealloc
being cheap then the occasional regrowing would also be cheap. So as long as we can avoid unnecessary shrinkage most of the time this might be good enough.
Could you elaborate on these scenarios? By far, I'd imagine that the typical scenario is that "an element out is worth an element in": just because you decreased the byte size of an element doesn't mean that you'll need multiples of the previous spare capacity in elements. The alternative scenarios would seem fairly niche, and an explicit capacity-preserving API for those scenarios would be far more clear in its intent than the extremely generic collect()
.
One exception, of course, is using flat_map()
or flatten()
to break each element into multiple subelements (e.g., RGB triples into components), which is why I suggested the length-ratio-based adjustment. But in that instance, keeping the full elements' worth of spare capacity still seems more like an optimization than something broadly useful; why split the vector into subelements, if you still have more data to push into it?
Ok, one further complication: We also want to avoid having a call to realloc sitting in the compiled code at all (even behind a non-taken branch) where possible since this allows some iterators to be reduced to a nop. After all that's how it always was before #110353 and still is in most cases. That's why there's a whole bunch of
const
in the optimization, to eliminate this at compiletime. I think this can be mostly based on the EXPAND_BY/MERGE_BY and type size calculation.
Yeah, my suggestion was meant to allow for recycling to be guaranteed at compile time in most cases where it's permitted. The EXPAND_BY
(if I understand it correctly) looks like it's equivalent to the upper-bound length as a multiple of the original length, when that can be determined at compile time. So if we know that 2 * EXPAND_BY * size_of::<Out>() >= size_of::<In>()
, then it's guaranteed that reusing the allocation won't violate the minimum occupancy. And then when shrinking the allocation, an output capacity of in_capacity * EXPAND_BY
elements will preserve the original occupancy.
I just wanted to chime in as someone who would prefer the optimization not do any fancy heuristics to resize the allocation. Documenting that collect()
will attempt to reuse the original allocation if possible seems sufficient, especially if adding a note about unused capacity when changing the type of the T in the Vec and recommending explicitly shrinking in that case if desired.
It seems much more predictable to me and easier both to grok and to explain for the allocation to retain its original on-disk size, even if that increases its capacity. The behavior described in the blog post does not strike me as a bug once you understand that collect()
can reuse the allocated memory.
Most of the time, we try to avoid allocations, so this is an optimization we are excited to see expand in stable.
I've read all the comments and wanted to add my perspective which I don't think is well covered.
When using collect()
, it's not immediately apparent when an iterator and its respective collect target can use an existing allocation or not. There are obvious situations in which you think it might be the case, but as a user one always wonders whether some intermediate iterator is going to maintain the allocation. I guess it's possible to document every case: "map will always keep the allocation in such and such a situation", but it feels the potential for confusion or misdocumentation is extensive (especially when it comes to iterators that aren't backed by any memory at all).
I basically assume that collect
will not reallocate, but I've never found myself looking to optimise this particular aspect.
The point I'm getting to is that IMO collect
should do something sensible in all cases, which probably means not maintaining a big allocation if the thing being collected is much smaller. If you get the previous allocation, then that's great, but it's an optimisation not an expectation. It feels to me therefore that there is probably a really good argument to be made for an extension to the AP
My expectation of iterators in Rust is that they are an abstraction over loops, so a function like this
fn primes(ints: Vec<u32>) -> Vec<u32> {
let mut primes = Vec::new();
for int in ints {
if is_prime(int) {
primes.push(int);
}
}
primes
}
can be written more cleanly as this
fn primes(ints: Vec<u32>) -> Vec<u32> {
ints.into_iter().filter(|&int| is_prime(int)).collect()
}
However, due to this optimization these two functions have different outputs for the same input; I can notice the optimization on the output of my function. In my view it then partially fails at being an optimization.
All that being said, I think there is significant leeway in the fact that users generally don't have an expectation for the exact capacity of a Vec
, it's only when the difference is very big that it becomes problematic. I like the idea of shrinking the capacity based on some threshold.
That entirely depends on how you write the loop! If you write it as a loop over &mut Vec<u32>
then the capacity-preserving falls out of it naturally.
The "zero-cost abstraction" motto is often spelled out as "you couldn't write it better manually". And the hyper-optimized code-golf solutions of course don't contain any extra allocations when they're unnecessary and only manually shrink the buffer in the end if they know that they won't need the extra space anymore.
collect()
does not know what you'll do with the resulting vector. You might extend()
it further or push()
into it, in which case the extra capacity will be useful.
And this is a one-way road. you can discard excess capacity with shrink_to_fit()
, but you can't undiscard it if collect()
were to do it automatically.
(This bug report was inspired by this blog post https://blog.polybdenum.com/2024/01/17/identifying-the-collect-vec-memory-leak-footgun.html)
After #110353 was landed, in-place iteration can reuse allocations in many more places. While this is a good thing, in some cases this can cause overly large capacity for the destination vector.
Additionally, in some this will cause the destination vector to have a very large capacity even for non-shared allocations. In my opinion, this should never happen.
For an example see this code:
On stable this code works as expected, i.e. two different vectors with reasonably lengths and capacities.
On beta however,
v2
will have a capacity of 262144, even though it does not share an allocation withv1
. If you remove theas u8
part, then the allocations will be shared, but the capacity is still overly large.My suggested fix is:
Meta
I am running NixOS with rustup.