Open cuviper opened 4 years ago
Preserving a relevant concern from @SimonSapin in https://github.com/rust-lang/rust/pull/72162#issuecomment-634279732:
These methods look very out of place in the
Extend
trait, and theextend_
prefix in their name is apparently unrelated to the meaning of the name of theextend
method.From an API design perspective, this seems to be asking for new
Push
andReserve
traits. This might deserve to be part of a larger overall design for traits for generic collections. The standard library has mostly avoided doing this so far, if factExtend
is the odd one out and arguably should not have been a trait.
Additional concern:
reserve
calls into allocation machinery. extend_one
is not fallible, so it will also call into the allocation machinery in case not enough capacity has been reserved. This can lead to code bloat. Instead it should be something more along the lines of #84649
With arrays implementing the IntoIterator trait, I'm not sure extend_one
is still pulling its weight? It seems the reasoning was that extend_one(v)
is nicer than extend(Some(v))
? To me at least, extend([v])
seems even nicer.
It's not just how nice it looks, but also simpler implementation, like Vec
is just a push
. We call this repeatedly in unzip
's own loop, so it's a benefit to keep that simple instead of adding an inner loop (even const 1).
@cuviper can't this be also solved with specialization? Adding special code for Option<T>
, iter::Once<T>
, and [T;1]
?
Yeah, I suppose specialization could do that too. Note that only [T;1]
is always one item though.
Note that only [T;1] is always one item though.
Good point! [T;1]
should be enough though.
Summarizing some real-time discussion from just now:
extend_one
seems fine to me.
For extend_reserve
, I'd prefer to at least see the default implementation be the one @dtolnay suggested: extend
with a dummy iterator that has a size_hint of the appropriate size. We should also document the expected semantics: should extend_reserve
panic if it can't allocate (like reserve
methods), or is it a hint that should ignore failure?
(I'd still prefer to have it as a free function that can't be overridden, rather than a default method, but with the above documentation and default impl I'd just be -0 on it and wouldn't oppose adding it.)
For
extend_reserve
, I'd prefer to at least see the default implementation be the one @dtolnay suggested:extend
with a dummy iterator that has a size_hint of the appropriate size.
So, returning a hint (size, Some(size))
, but no items? Often the lower bound is what collections use for reservation. I guess it's technically safe to lie about this though. (edit: I see #88761 is already there!)
(I'd still prefer to have it as a free function that can't be overridden, rather than a default method, but with the above documentation and default impl I'd just be -0 on it and wouldn't oppose adding it.)
You mean by using that hint trick in the free function? I could do that.
Actually, if we go that route, we can also let unzip
and Extend for (A, B)
pass the complete size hint from the source iterator, rather than just additional
from the lower bound. That seems compelling.
(Note: hashbrown
will have to remove its extend_reserve
before we can actually kill it.)
So, returning a hint (size, Some(size)), but no items? Often the lower bound is what collections use for reservation. I guess it's technically safe to lie about this though.
That's a good point. Iterator::size_hint
docs say
A buggy iterator may yield less than the lower bound or more than the upper bound of elements. [...]
That said, the implementation should provide a correct estimation, because otherwise it would be a violation of the trait’s protocol.
So it implies that it's kind of buggy behavior to have an iterator that doesn't return its lower bound. At the very least it leaves a bad taste.
It would be nicer if we just passed in a single iterator where a call to next()
spins the outer loop of whatever is driving Extend
(generators were mentioned in the meeting).
Shoulld extend_reserve
have Vec::reserve
or Vec::reserve_exact
semantics?
I see #72162 uses reserve
everywhere, but has this been examined and justified?
I would also like a Push
and TryPush
and TryExtend
trait:
pub trait Push {
type Item;
fn push<'a>(&'a mut self, item: Self::Item) -> &'a Self::Item;
}
pub trait TryPush {
type Item;
fn try_push<'a>(&'a mut self, item: Self::Item) -> Result<&'a Self::Item>;
}
pub trait TryExtend {
type Item;
fn try_extend<Iter: IntoIterator<Item = Self::Item>>(&mut self, iter: Iter) -> Result<()>;
}
@Stargateur:
push
and insert
on eg Vec
don't return a reference. Is this a bad legacy decision? I don't know.push
and try_push
returning an immutable reference despite requiring a mutable reference to the container intentional? Are there containers in the wild that need this constraint?&Self::Item
as a return? What about eg a HashSet
that gets extended with (K, V)
tuples? I imagine you'd want a push on that to return a &mut V
.@qm3ster
Oh It was just a quite and dirty example, my main idea is I don't see why we can't have a trait for Push one item on alloc collection. Maybe split Push
and TryPush
is not need (same for extend). But I think Extend and Push should be two separate trait. My may issue with current design is you need to try_reserve
THEN you can use extend. It's two step when it's should be one step for the user. Since we loose the battle in early Rust to have fallible allocation I would like to have trait where at least are nice to simple to use and not error prone. Having a trait TryPush
and TryExtend
will remove all user the burden to check at every call try_reserve()
.
unwrap()
when push
could just return it. Also this is probably cost free every function should return something expect few thing like drop
. And even if this is not cost free I expect the compiler will optimize it away since it's very simple to optimize a not used reference (same argument is say about user to call vec::last
that is optimize away after a Vec::push
but again better optimize away the not used return value and this remove unwrap in user code). I give something to a function and it give me nothing back ? steal !Push
on HashMap
is a possibility too. We are not force to do it. But it a valid point, maybe have a second associate type like Output
allowing HashMap
to return a (&K, &V)
instead of &(K, V)
that will be not possible for several implementation of such. Problem of this it probably require Gats
to compile.So maybe:
pub trait Push {
type Item;
type Output;
fn push<'a>(&'a mut self, item: Self::Item) -> Self::Output;
}
pub trait TryPush {
type Item;
type Output;
// the error should return the item cause it didn't consume it
fn try_push<'a>(&'a mut self, item: Self::Item) -> Result<Self::Output, ErrorFoo<Self::Item>>;
}
I don't know GaTs
enough to know exactly but I often design thing where compiler tell me GATS as error. So if someone know better to make this trait compilable with or without gats. The idea is here, have traits to push and extend and return the item for push. Also, need to design the error type, should it be fixed or generic.
A GAT API for pushing to HashMap
could possibly look like this:
#![feature(entry_insert, generic_associated_types)]
use std::collections::HashMap;
use std::hash::{BuildHasher, Hash};
pub trait Push {
type Item;
type Output<'a>
where
Self: 'a;
fn push<'a>(&'a mut self, item: Self::Item) -> Self::Output<'a>;
}
impl<K, V, S> Push for HashMap<K, V, S>
where
K: Eq + Hash,
S: BuildHasher,
{
type Item = (K, V);
type Output<'a>
where
Self: 'a,
= &'a mut V;
fn push<'a>(&'a mut self, (key, value): (K, V)) -> &'a mut V {
self.entry(key).insert(value).into_mut()
}
}
I'm writing something that uses size hint similar to iterators and was wondering how it works in std
. I found some surprises related to this. As @qm3ster points out there's a question about use of reserve
vs reserve_exact
but there's also issue of upper bound of Iterator::size_hint()
not being used.
I'm thinking about these scenarios:
iter.collect()
or collection.extend()
, then store the collection for a long time - best to avoid over-allocatingiter.collect()
or collection.extend()
, then push some new elements without iterator (async
and such)Iterator::take()
which can have very high upper bound (typical scenario is DoS protection if the iterator is untrusted input)I'm still unsure about the best solution but if there are people smarter than me interested I'd like to know what they are thinking.
@Kixunil I would like to note that repeatedly calling collection.extend()
with small iterators is common. There's also cases when a collection is deserialized and extended immediately, or deserialized and then extended indefinitely (in the latter case, probably with some removals, so resizing may be amortized regardless).
@qm3ster I believe all such cases can be solved using Iterator::chain
which also happens to be less tedious than manually calculating the right reserve size. If you have a situation that can not be converted I'm curious to learn!
Of course! I am talking about calling it at runtime, in an event/message-driven loop, not statically multiple times in a row in a block of code.
Ah, finally a good case when it's needed!
Perhaps another solution is some kind of fn allocation_hint(&self) -> Option<usize>;
method on iterators. That would fix the "lying about size_hint" problem, and you could get the functionality of extend_reserve with just .extend(itertools::allocation_hint(n))
or something. I've wanted this in the context of fallible iterators before, since for .collect::<Result<Vec<_>, _>>()
the iterator constructed by try_process
has a lower-bound size hint of 0, so the vec always uses an exponential allocation strategy, even though the iterator being collected from might have an exact size_hint.
For the record, I've just ran some simple benchmarks against vec.extend(Some(x))
and vec.extend_one(x)
and the extend_some
variant performed 20% faster on average. The main difference I can see from the codegen is the reserve
call, and the order of the branching
example::extend_one:
push rbx
mov rbx, rdi
mov rsi, qword ptr [rdi + 16]
cmp rsi, qword ptr [rdi + 8]
jne .LBB3_2
mov rdi, rbx
call alloc::raw_vec::RawVec<T,A>::reserve_for_push
mov rsi, qword ptr [rbx + 16]
.LBB3_2:
mov rax, qword ptr [rbx]
mov byte ptr [rax + rsi], 0
inc rsi
mov qword ptr [rbx + 16], rsi
pop rbx
ret
example::extend_some:
push rbx
mov rbx, rdi
mov rsi, qword ptr [rdi + 16]
cmp qword ptr [rdi + 8], rsi
je .LBB4_1
.LBB4_2:
mov rax, qword ptr [rbx]
mov byte ptr [rax + rsi], 0
inc rsi
mov qword ptr [rbx + 16], rsi
pop rbx
ret
.LBB4_1:
mov edx, 1
mov rdi, rbx
call alloc::raw_vec::RawVec<T,A>::reserve::do_reserve_and_handle
mov rsi, qword ptr [rbx + 16]
jmp .LBB4_2
My judgement from reading the raw_vec code is that extend_some
, by using reserve
which has allocation marked as #[cold]
, reorders the branching to be more helpful. Whereas the extend_one
uses reserve_for_push
which is not #[cold]
and therefore suffers.
This is a tracking issue for the trait methods
Extend::extend_one
andextend_reserve
. The feature gate for the issue is#![feature(extend_one)]
.About tracking issues
Tracking issues are used to record the overall progress of implementation. They are also uses as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
Steps
Unresolved Questions
Extend
is in the prelude, so the explicitextend_
prefix avoids ambiguity.Extend
the best place for just methods?Iterator::unzip
, which only hasDefault + Extend
, so there's not really much choice.Implementation history