Closed TethysSvensson closed 5 months ago
If you write it as a loop over
&mut Vec<u32>
then the capacity-preserving falls out of it naturally.
But the iterator chain consumes a Vec
and returns a different Vec
. If collect()
was acting on a &mut Vec<>
(or a derivative) then I think the argument would be different.
On the other hand if a new version of Rust extends this optimization to new code you will balloon your memory usage. This may not be technically breaking but it is effectively breaking.
If one was being truly paranoid they would need to call shrink_to_fit
after any instance of .collect<Vec<_>>()
where the resulting Vec
is expected to live for a significant period of time. Otherwise any Rust update can cause very surprising memory usage.
I don't want everyone running out to add shrink_to_fit
after every collect
call. So I think it makes sense to place an upper bound of the unused (yes, you may use it but we don't know at the time) capacity that can result from a collect
. This ensures that Rust has reasonable behaviour in all situations. Then those who really need high performance can explicitly reuse allocations or those who need really need low memory usage can be sure to call shrink_to_fit
everywhere. But I think the defaults should behave well for the common case.
Documenting that
collect()
will attempt to reuse the original allocation if possible seems sufficient, especially if adding a note about unused capacity when changing the type of the T in the Vec and recommending explicitly shrinking in that case if desired.
The thing I'm not a fan of about that is that it's common to collect from an unknown iterator. So just documenting this means that it's implicitly telling people that if they're collecting a passed-in iterator (perhaps after filtering it), they should probably do some conditional shrinking of the capacity afterwards, because otherwise they could accidentally be wasting an arbitrarily-high amount of memory.
It's not at all clear to me that doing that -- and thus encouraging a bunch of PRs to crates that use collect
-- is a net better outcome.
The behavior described in the blog post does not strike me as a bug once you understand that
collect()
can reuse the allocated memory.
Regardless of whether it's a "bug" it's something that the experience with Java's substring shows is a footgun: https://stackoverflow.com/questions/14161050/java-string-substring-method-potential-memory-leak
When the goal is to use this for re-using buffers across different types, then the programmer controls the source and target Vec
s, so I think giving those people something specific to make that desire clearer would work fine, and is a better trade-off than having to think about it for every collect
.
"Never a complexity class more storage than writing a push
loop" seems like an entirely reasonable expectation from collect
, to me. (I do agree that we definitely shouldn't give any strict guarantee about it, just like we don't give strict guarantees about push
capacities.)
EDIT: ...and @kevincox just said basically the same thing as we typed in parallel.
But the iterator chain consumes a Vec and returns a different Vec.
You can see that as a method that consumes the Vec
and still internally operates on &mut Vec<>
and then does into_raw_parts/from_raw_parts
to do any type changes if any. In the meanwhile it operates on the raw storage. That's basically how the optimization works
That's very low-level code, sure. But it's the best possible loop you could write for things that don't need extra storage.
they should probably do some conditional shrinking of the capacity afterwards, because otherwise they could accidentally be wasting an arbitrarily-high amount of memory.
That's a non-issue if you drop the vec soon. Or run yet another transformation on this. It's not like the need to shrink is universal. For years nobody noticed with the existing optimization.
One person made a blog post, but didn't consider it enough of an issue to, well, open an issue. Someone else did, even though they weren't personally affected by this!
"Never a complexity class more storage than writing a
push
loop" seems like an entirely reasonable expectation fromcollect
, to me. (I do agree that we definitely shouldn't give any strict guarantee about it, just like we don't give strict guarantees aboutpush
capacities.)
+1
And if there's a concern that some allocators may not shrink in place -- a final realloc
memcpy
is still a reasonable tradeoff IMO.
Also, you can't really judge lack of complaints when the expanded scope of the reuse optimization hasn't reached stable yet. The people piling in here in alarm all count too! I do think it's worth taking a step back.
The people piling in here in alarm all count too!
There's also people who like the optimization e.g. https://github.com/rust-lang/rust/issues/120091#issuecomment-1899638455 or this comment on reddit
And I'm not sure a popularity contest is a good way to judge this since the reporting biases who is responding to the issue.
I think there's enough doubt to justify a revert while we reconsider. The capacity shrink is a less drastic compromise, whether we set the threshold at 2x or even more generously at 4x as first suggested.
"Never a complexity class more storage than writing a push loop" seems like an entirely reasonable expectation from collect, to me.
Note that this already is not the case on stable. And as mentioned several times in the thread some people use that to avoid allocations even when the complexity-class diffference is... unbounded (which trivially happens when len() == 0). Hence the more complex heuristics such as those suggested in https://github.com/rust-lang/rust/issues/120091#issuecomment-1899271788
@cuviper I believe it would be fitting to mark this issue as regression-from-stable-to-nightly (possibly to-beta?) until it is clarified.
I don't think it's a behavior we'd want to sneak in stable without explicit acknowledgement that it is intended, and the label would help ensuring it doesn't end up there by default.
Done -- but do note that a regression label is not a promise that anything will change.
"Never a complexity class more storage than writing a
push
loop" seems like an entirely reasonable expectation fromcollect
, to me. (I do agree that we definitely shouldn't give any strict guarantee about it, just like we don't give strict guarantees aboutpush
capacities.)
One particular problem with that, which was hit by the post's author, is with a filter()
that removes most of the elements. The question is, is it fair for us to continue preserving the full capacity unconditionally (since that is the current same-layout behavior in stable Rust), under the assumption that a typical filter()
will keep a significant fraction of the elements? Such an assumption seems reasonable to me, matching the behavior of Vec::retain()
. And as @the8472 noted, we want to avoid conditional shrink_to_fit()
s whenever possible, and try to stick to compile-time information when deciding if the allocation can be recycled. That's why I suggested to consider only the upper bound on the number of elements the iterator might return.
I think the options (besides reverting) are:
shrink_to_fit()
to reduce memory footprint.extend
ing/push
ing into the vec after it has been collected.realloc
always sitting in the in-place code may affect optimizations. On stable it doesn't exist at all and on nightly it's often removed via some const
conditions.And separately we could add a clear_and_try_recycle_allocation(self: Vec<T>) -> Vec<U>
method. So that people have to rely less on complicated optimizations.
I definitely think we should be exploring APIs to explicitly meet the needs of those informal suggestions.
2. always shrink if there's an excess above some fixed factor. As noted previously this would prevent some uses that have been informally suggested to users to recycle allocations that currently work on stable. This also pessimizes users who keep
extend
ing/push
ing into the vec after it has been collected.
Note that it doesn't have to be shrink_to_fit
entirely -- it could be an unconditional shrink_to(X * len)
, keeping some excess if it exists (and no-op if it's already smaller).
I was not implying that it would be the shrink_to_fit
variant. The pessimization may also happen if you do some refilling that happens to use all the capacity that would have otherwise been retained.
It's just not possible for collect()
to know what uses will happen downstream of it. They might use all of the capacity or none of it. If we add an automatism the user can't opt out of it because the cost will already have been paid at that point.
Such an assumption seems reasonable to me, matching the behavior of
Vec::retain()
.
Personally my expectations of retain
is very different than collect
. With retain it takes a &mut self
so I see it as the same object. I expect that capacity is unchanged. With collect
it seems like this creates a new object. The docs don't explicitly say it creates a new object but imply it with phrases like "create instances" and "Using collect() to make a String".
For example it is very surprising that calling this function with different types can use hugely different amounts of memory for the return value. Note that the function that creates the vector itself has no actual idea if it will return something with lots of extra capacity.
fn test(i: impl IntoIterator<Item=u32>) -> Vec<u32> {
i.into_iter().filter(|_| false).collect()
}
fn main() {
eprintln!("vec: {}", test(vec![1; 1000]).capacity()); // 1000
eprintln!("vec: {}", test([1; 1000]).capacity()); // 0
}
Personally my mental model of collect::<Vec<_>>()
looks something like the below code. I expect that this is similar to many users expectations (or maybe they don't even consider the upfront reserve).
fn my_collect<T>(i: impl Iterator<Item=T>) -> Vec<T> {
let mut out = Vec::with_capacity(i.size_hint().0);
for e in i {
out.push(e);
}
out
}
And I appreciate that it is nice to add allocation reuse here, even if it is user-perceptible. But as others have agreed with holding around "arbitrary" amounts of extra memory is very surprising.
My preferred outcome would be:
Vec
that automatic allocations will use at most some constant factor of the in-use memory.collect
to shrink_to_fit
if that rule would be violated due to allocation reuse.Vec
with a potentially different type.I think all of these are quite important changes:
I think one potentially missing thing would be how to handle reliable allocation reuse while changing the type. IIUC this can really only be done if the new type is <=
the size of the original type. But this would need some more specific API if we wanted to enable it to be explicit.
It's just not possible for
collect()
to know what uses will happen downstream of it. They might use all of the capacity or none of it. If we add an automatism the user can't opt out of it because the cost will already have been paid at that point.
Right, we can't know -- but that's why I think this optimization should tread closer to as-if behavior. Limiting to 2 * len
is roughly that, but actually push
's growth would have a capacity more like len.next_power_of_two()
. (Unless we consider exact-size with_capacity
results, then it would be shrink_to_fit
.)
I think the "as-if" comparison is a great one! I believe that we should keep the behaviour similar to what is "expected" by the simple model of these combinators. Managing to perform clever optimizations is great when they are all upsides. But when clever optimizations accidentally bite you it really sucks. So I'd rather make sure that the optimizations are improvements in the vast majority of use cases. If we have optimizations that may be helpful or may be hurtful it is probably better to allowing the user to opt-in explicitly, even if generally the benefit is positive. Otherwise seemingly benign changes to the code like replacing a for x in i { vec.push(x) }
with a collect
will have surprising consequences.
I get that this is definitely a grey area because it is impossible to define what is expected by users and because Vec
capacity is not exactly defined. However there are some expectations here that users reasonably depend on (such that push
doesn't allocate 1000x more capacity than len) and I think based on the attention this ticket has got shows that even if this is a rare case it is surprising to many users. So I would default to the not surprising case, even if it isn't optimal performance in other cases.
On the topic of explicit APIs, I was thinking that a collect_with_capacity
may be a missing in the APIs here.
Many collections today allow hinting at the capacity, and reserving at least that much. That is, if I know I will be further using a Vec
or HashMap
I collect into, I'd like to be able to hint the capacity I guess I am going to need, and depending on the usecase be conservative or aggressive in the estimate.
Today, this is possible in two steps to ensure the target capacity:
let mut v = Vec::with_capacity(x);
v.extend(some_iterator);
A minor issue is that it is two steps, which leads to the major issue that allocation reuse is not possible.
A collect_with_capacity
API would, however, address both issues:
Further, because the target capacity is known in advance, inlining collect_with_capacity
allows the compiler to optimize the selection of which collection algorithm to use, as it knows upfront whether it's going to need to extend, shrink, or do nothing.
It does still suffer from one disadvantage: there is no guarantee it will reuse the underlying allocation. This is annoying when reuse is necessary for performance reasons.
On the topic of explicit APIs, I was thinking that a
collect_with_capacity
may be a missing in the APIs here.
What would that do if the target type doesn't have a concept of capacity? Would this be based on a new trait like FromIteratorWithCapacity
?
Edit: I guess I'm still trying to work out if the problem is a Vec problem, or is it more general. Can we allow downstream types to make use of the memory, or is that just too far out of scope?
I am not sure if I'm following the whole collect_with_capacity
concept:
To me there are two different variants:
collect_with_capacity
)collect_with_current_allocation
). Calling it capacity in this case could be really confusing if the element size is different between the original T
and the target U
.=> Two me it seems there are two API missing: One where I can define the (minimum) capacity of the target collection and one where the original collection is kept without any shrinking attempt.
One where I can define the (minimum) capacity of the target collection and one where the original collection is kept without any shrinking attempt.
I think the point is that with_capacity
would use the existing allocation up to capacity. It's still not explicit though. I'd prefer an API that guarantees reuse (or fails). collect_with_current_allocation
would still expose the current issue.
I think the point is that with_capacity would use the existing allocation up to capacity
This only works if the if the target collection is actually smaller than capacity. With something like .iter().flatten().collect_with_capacity(1234)
you could end up with a result that is much bigger than the specified capacity or even bigger than the original allocation. (E.g. where you have Vec<Vec
collect_with_current_allocation
would still expose the current issue.
except the current "issue" is not a universal issue: I was proposing collect_with_current_allocation
, because I have uses cases where I want this behaviour: Keep the current allocation wherever possible, or expand if necessary. (One of the main reason why we are "here" today is because there was demand for this behaviour in the first place - the change leading to this issue was not done due to sheer boredom, but because people where asking for allocation recycling.)
I'd prefer an API that guarantees reuse (or fails).
So this would be yet another use case, potentially requiring yet another API ;)
EDIT: fix quoting
One of the main reason why we are "here" today is because there was demand for this behaviour in the first place - the change leading to this issue was not done due to sheer boredom, but because people where asking for allocation recycling
Absolutely. It does feel to me that the problem is there is no generic way for a type to expose its allocated buffer in such a way that a consuming type can take over that allocation. Instead the collect
case on Vec
is a std
special case in which the behaviour diverges from what is possible in normal user code (hence, probably, why the expectations are not necessarily consistent with the behaviour).
I'd prefer an API that guarantees reuse (or fails).
I doubt that we can expose this as an API anytime soon or generically. In practice it only works on a few (although very important) types, what exactly works changes over time, relies on some unstable and very unsafe features and can't be implemented by user crates. So it's not something that really makes sense for FromIterator
.
This will have to remain in the realm of non-guaranteed optimizations for a while.
A Vec::clear_and_try_recycle_allocation
would be different because it's far less generic and, at least for A=Global
, can already be implemented on stable with just a bit of unsafe
.
It's unclear whether it would affect the thing that motivated the blog-post since that only contains simplified examples, not the original code that triggered this.
It was (u64, u128)
-> (u32, u32)
, so requiring matching alignments would have prevented the optimization in my case.
This was discussed in a libs meeting but no conclusion on whether the range of possible behaviors should be narrowed was reached. We did agree that it should at least be documented (#120355) and the recent changes warrant a release note (#120004).
Additionally #120116 removes the cases in which the attempted optimization would never have been useful on stable and that also caused the regression that lead to the blog post. After the removal there still are some additional cases in that were not optimized on previous stable releases. We're going to let those propagate to stable and wait for feedback. The assumption is that only very few crates will need to apply shrinking.
Documentation has been added, the case where the optimization wasn't effective has been removed and there have been no further reports so I'll close this for now.
(This bug report was inspired by this blog post https://blog.polybdenum.com/2024/01/17/identifying-the-collect-vec-memory-leak-footgun.html)
After #110353 was landed, in-place iteration can reuse allocations in many more places. While this is a good thing, in some cases this can cause overly large capacity for the destination vector.
Additionally, in some this will cause the destination vector to have a very large capacity even for non-shared allocations. In my opinion, this should never happen.
For an example see this code:
On stable this code works as expected, i.e. two different vectors with reasonably lengths and capacities.
On beta however,
v2
will have a capacity of 262144, even though it does not share an allocation withv1
. If you remove theas u8
part, then the allocations will be shared, but the capacity is still overly large.My suggested fix is:
Meta
I am running NixOS with rustup.