Open a-lafrance opened 2 years ago
The name try_collect()
may become a problem if one day we would want a collect()
-like method with fallible allocations (like e.g. Vec::try_reserve()
).
Maybe I'm blind and I can't find it, but you can already to something like this:
fn main() {
let a = vec![Some(3), Some(4)];
let b = vec![Some(3), None];
let c = vec![Result::<_, ()>::Ok(3), Ok(4)];
let d = vec![Ok(3), Err(())];
println!("{:?}", a.into_iter().collect::<Option<Vec<_>>>());
println!("{:?}", b.into_iter().collect::<Option<Vec<_>>>());
println!("{:?}", c.into_iter().collect::<Result<Vec<_>, _>>());
println!("{:?}", d.into_iter().collect::<Result<Vec<_>, _>>());
}
What is the difference to try_collect
?
IIRC the point was to ease the discovery of the method.
The name
try_collect()
may become a problem if one day we would want acollect()
-like method with fallible allocations (like e.g.Vec::try_reserve()
).
Right, the name of this function made me think it was about collecting regular (non-Try
) values into things like arrays or fallibly allocated Vec
s.
What is the difference to
try_collect
?
@hellow554 the main differences are:
try_collect()
helper is meant to make the technique more discoverable by "promoting" it into its own method.Try
but not FromIterator
types: This is the main functional difference between try_collect()
and collect()
. collect()
requires the Try
type involved to also implement FromIterator
, which is fine for simple cases like Option
and Result
, but doesn't allow other "Try
-not-FromIterator
" types like ControlFlow
to be fallibly collected. In contrast, try_collect()
only requires that the output type being collected into implements FromIterator
, so I can try_collect()
an iterator yielding ControlFlow<_, i32>
into a ControlFlow<_, Vec<i32>>
, which there's no way to do with collect()
.collect()
takes ownership of the iterator, so you can't continue to use it after a failure occurs when collecting. try_collect()
mutably borrows the iterator, so even after a failure occurs, you can continue to perform iteration (e.g. you can call try_collect()
again).Try
type when using try_collect()
like you do with collect()
. Concretely, if I have an iterator yielding Result<i32, SomeComplicatedError>
, I can call try_collect::<Vec<_>>()
to perform fallible collection, but I have to call collect::<Result<Vec<_>, SomeComplicatedError>()
to do the same thing.Perhaps I wasn't clear enough about the difference between try_collect()
and collect()
in my original description of the feature. If it would make things more clear, I'm happy to modify the feature description up above to get the points I just mentioned across.
The name
try_collect()
may become a problem if one day we would want acollect()
-like method with fallible allocations (like e.g.Vec::try_reserve()
).
Yeah, I see how that might become a conflict; I guess there's two distinct ways that collecting can fail. If anyone has suggestions for method names that are clearer, I'd love to hear them.
On the point about collect
consuming the iterator, I think you could already do .by_ref().collect()
. This might be useful to prevent mistakenly using the iterator again, the by_ref
makes sure the programmer actually wanted to continue using the iterator. So would it be a benefit for try_collect
to also consume the iterator and then mention using by_ref
in the docs if you actually want to resume using it?
@a-lafrance thanks for that explanation. That helps me understanding your intent. And yes, IMHO that should be in the original post as a description :)
@TrolledWoods That was my first thought too, why don't you just use by_ref
. Then I looked at the other three functions and they are each taking &mut self
.
I mean... it is what is is, but self
would have been better IMHO :) But now let's stick to that. I doesn't really matter, does it?
@hellow554 If there is an argument for them to take self
by reference, fine. But we should not repeat past mistakes in new APIs 😃
Yeah, given that the by_ref()
technique exists I can definitely see the desire to take self
rather than &mut self
, so I'm definitely willing to consider that change. I guess since the PR is already landing we can leave it as-is for now and revisit it later, so I'll add it as an unresolved question for now.
Should it take self rather than &mut self, to prevent users from accidentally continuing to use the iterator after a try_collect() failure? Note that you can still continue to use the iterator if you use by_ref() first, so it's not necessarily a functionality change.
Note that by_ref
is a huge performance footgun, since it prevents using internal iteration, as it uses the &mut I where I: Iterator
impl that, because it works for dyn Iterator
, can't use any of the generics. The benchmarks in core
can be used to see the extra cost: https://github.com/rust-lang/rust/blob/f838a425e3134d036a7d9632935111a569ac7446/library/core/benches/iter.rs#L150-L158
It's absolutely by design that anything which can early-exit takes &mut self
. That's why all
and any
take &mut self
, for example, while count
and max
and such don't. And because all
takes &mut self
, that means try_fold
needs to as well in order to meet its primary purpose (#45595) of being the one internal iteration method in terms of which every single Iterator
method can be defined.
There's nothing wrong with resuming iteration after getting an error from try_collect
. And sure, you can do that wrong and try to resume iteration after getting a success from try_collect
on a non-fused iterator, but that's exactly the same mistake as resuming iteration after getting None
from .next()
, so that seems fine for the same reasons.
Oh that's an interesting point @scottmcm. I was wondering why some methods took self
while others took &mut self
so this clears it up for me. In that case I'm personally in favor of continuing to take &mut self
rather than self
.
Note that
by_ref
is a huge performance footgun, since it prevents using internal iteration, as it uses the&mut I where I: Iterator
impl that, because it works fordyn Iterator
, can't use any of the generics.
Can this situation be improved using specialization? I.e. specialize impl<I: Sized> Iterator for &mut I
to use internal iteration where possible? If not, I think this definitely should be mentioned on the docs, since it is not at all obvious.
Thinking about it more there's also the argument that &mut self signifies that it's intended to sometimes resume the value, whereas if it's self by value, you should expect weird behaviour when resuming, so it may be a better idea, yeah. Surprising that by_ref is a performance footgun though, that feels like something that should be fixed asap
I probably shouldn't have used a word as strong as "footgun" for this. If you're just iterating over, say, a slice, then it makes no difference at all.
It's only in certain cases -- Chain
being the usual example -- where it matters. And even then only when the body of the loop is tiny. As soon as the body is doing anything substantial, any difference immediately disappears in the noise.
So it's very important in core
, where we make sure to avoid pessimizing code. But outside of that it's often unimportant, and the knowledge lives in things like that blog post I linked. I also don't want people to overindex on this and start telling anyone using a for
loop or by_ref
that they're "doing it wrong" because of a note in the docs either.
I don't know whether specialization in currently in a place where it could fix it.
Just a thought: I was looking back at Rust by Example recently and I noticed that this section mentions fallible collection with Iterator::collect
. Would it be worth updating to include try_collect
upon stabilization? Just so that people are aware of the new technique that we've introduced.
offtopic: This is the first unstable feature I have ever encountered that has any business still being unstable (i.e. has had discussion in the last 5 years). Bravo.
I'd like to share a benefit from try_collect
vs collect::<Result<?, ?>>
. try_collect
allows me to hide the concrete type of the Result::E
type, which is convenient when we want to split an error from a more extensive scope to a smaller one. If we use try_collect
, we don't need to change every code that uses which Result
, and ?
will do the implicit cast from the low-level error to the high-level one.
An example https://github.com/singularity-data/risingwave/pull/3081
Maybe I'm blind and I can't find it, but you can already to something like this:
fn main() { let a = vec![Some(3), Some(4)]; let b = vec![Some(3), None]; let c = vec![Result::<_, ()>::Ok(3), Ok(4)]; let d = vec![Ok(3), Err(())]; println!("{:?}", a.into_iter().collect::<Option<Vec<_>>>()); println!("{:?}", b.into_iter().collect::<Option<Vec<_>>>()); println!("{:?}", c.into_iter().collect::<Result<Vec<_>, _>>()); println!("{:?}", d.into_iter().collect::<Result<Vec<_>, _>>()); }
What is the difference to
try_collect
?
First of all, thank you for this, I needed that snippet. Also, if this ever gets merged, it'd be nice to have a clippy lint so I don't have to remember to remove where I've copied and pasted that snippet all over my codebase. :3
The name
try_collect()
may become a problem if one day we would want acollect()
-like method with fallible allocations (like e.g.Vec::try_reserve()
).Right, the name of this function made me think it was about collecting regular (non-
Try
) values into things like arrays or fallibly allocatedVec
s.
I think the iterator methods prefixed with try_
can all be (and should continue to be) taken to be mean "this iterator method short-circuits with the first error encountered". The try_
in Vec::try_reserve
is like the try_
in Rc::try_new
and array::try_from
— it simply means "this thing might not work out". Since the proposed method is/would be defined on iterators and not more general Rust objects/types, I think try_collect
is the correct name for it. (One could say that the iterator-specific meaning of the try_
prefix "shadows" the language-wide meaning.)
If collecting can fail due to a TryReserveError
— which AFAIK is the only way collecting into a FromIterator
could fail — then I think the appropriate name for that method would be something like collect_try_reserve
. And if more general errors need to be handled then I think a TryFromIterator
trait would be appropriate.
If collecting can fail due to a TryReserveError — which AFAIK is the only way collecting into a FromIterator could fail — then I think the appropriate name for that method would be something like collect_try_reserve. And if more general errors need to be handled then I think a TryFromIterator trait would be appropriate.
It's not specific to reservation of Vec
s at all, there are many data structures that would benefit from a fallible FromIterator
and such a try_collect()
method: arrays (as previously mentioned) are an obvious example, but another one would be a Grid
type that can be constructed from an iterator of iterators (the rows of the grid), and which would fail if one of the rows has a different length from the others.
I don't like the name try_collect
for this; it doesn't fit the rest of the standard library API; i.e. it doesn't use a TryFromIterator
trait, but rather a FromIterator
over items that implement Try
, which is a different idea.
Based on its name, I would've expected try_collect
to behave similarly to how try_into()
can be used where it only succeeds where the source can be converted to the target; i.e. I'd expect to be able to try_collect()
into a [T; 10]
, e.g., where it would only succeed if the iterator had exactly 10 items.
This seems like an unnecessary generalization of the idea of being able to collect()
to Result
/Option
, which already exists today and similar implementations could be added to any new type that implements Try
in the future without "stealing" the try_collect
name for a different purpose than (at least I think) seems the most natural -- i.e. fallible collection, not collecting fallible items.
It might seem a more clunky name, but this seems like it should be named collect_try
rather than try_collect
, if it should exist at all.
(NOT A CONTRIBUTION)
It is my strongly held position that this API should be removed from the standard library. Its usefulness is far far less than the cognitive load it brings and therefore it does not carry its weight.
The biggest issue with this API is that its signature is completely unreasonable. Return <<Self::Item as Try>::Residual as Residua<B>l>::TryType
? Multiple associated type projections to perform type changing like this is not a pattern that should be accepted into std without extremely compelling motivation for the simple reason that is totally inscrutable to most users.
And the motivation to add this API to std is not compelling at all. From the API docs and this thread I see:
Try
but not FromIterator
can be collected into (the overwhelmingly most common Try
types, Result and Option, do implement FromIterator
).Result
with try_collect
whereas you do with collect
.Remember that collect is just a convenient sugar over a for loop. For that final stated advantage, I would strongly prefer just writing a for loop that doesn't short circuit on the error case.
When contrasting these benefits with the signature of this function, I do not have any doubt that this is not a motivation that justifies a signature like this. When you consider other problems brought up in this thread - like that it takes a name that could plausibly useful for a fallibly allocating collect API, which would presumably have a simpler signature and a more compelling motivation - I am honestly shocked that this API has been added to std.
The biggest issue with this API is that its signature is completely unreasonable. Return
<<Self::Item as Try>::Residual as Residual<B>>::TryType
? Multiple associated type projections to perform type changing like this is not a pattern that should be accepted into std without extremely compelling motivation for the simple reason that is totally inscrutable to most users.
The Try
trait is still unstable, it may be that the final design won't be so inscrutable. I would suspend judgment until Try
is stabilized.
Another potential issue is that a hypothetical faillibly-allocating collection function might want to be called try_collect
.
What use cases are served by Iterator::try_collect
which aren't already covered by Result::from_iter
?
The try_collect method allows you to collect results from an iterator, where each item in the iterator is a Result. If an error occurs during the iteration, try_collect will return the error immediately. Which is useful in situations where you have a collection of results and you want to handle all the errors together. By using it you can avoid having to manually handle errors in the loop that processes the results.
What use cases are served by
Iterator::try_collect
which aren't already covered byResult::from_iter
?
@WhyNotHugo I think the docs on itertools's try_collect
summarize it quite well:
.try_collect()
is more convenient way of writing.collect::<Result<_, _>>()
It's just a convenience alias :)
If Iterator::collect()
were able to infer that we want to collect into a Result
and allow the use of ?
short-circuiting immediately after, there'd be no need try_collect
i think.
But, for example, this doesn't compile:
fn process_data(data: &[&str]) -> anyhow::Result<()> {
let numbers: Vec<u64> = data.iter().map(|s| s.parse()).collect()?;
println!("processing {numbers:?}");
Ok(())
}
(Imagine the data
parameter being anything iterable, and s.parse()
being any fallible operation.)
We just want to collect numbers into a vector, and propagate any parsing errors up.
It doesn't work even if we change the return type of the function to explicitly be Result<(), std::num::ParseIntError>
, such that the ?
doesn't need to do any type conversions.
Some alternatives that do compile are:
Specifying that we want to collect into a Result
via a turbofish type param:
let numbers: Vec<u64> = data.iter().map(|s| s.parse()).collect::<Result<_, _>>()?;
Which is kind of noisy. Why do we need to say we're collecting into a Result
if we're immediately propagating it up via the ?
that follows that .collect()
?
Or also including the container type on the turbofish type parameter:
let numbers = data.iter().map(|s| s.parse()).collect::<Result<Vec<u64>, _>>()?;
Which is as noisy as the previous option, but with the added drawback that in order to know the type of numbers
you have to chase it all the way to the end of that expression.
Using Result::from_iter()
instead of collect()
:
let numbers: Vec<u64> = Result::from_iter(data.iter().map(|s| s.parse()))?;
Which is kind of unidiomatic (even the docs for Result::from_iter()
use .collect()
instead)
Separating the collection and the Result
propagation into two different lines:
let parse_results: Result<_, _> = data.iter().map(|s| s.parse()).collect();
let numbers: Vec<u64> = parse_results?;
This one is not too bad TBH. Though i'd much rather prefer a readable one-liner :)
But none of these options are as readable as the first snippet IMO. And that's the appeal of try_collect()
.
Using itertools' try_collect()
instead of collect()
on the first snippet compiles just fine:
let numbers: Vec<u64> = data.iter().map(|s| s.parse()).try_collect()?;
BTW, i understand there's a naming issue with try_collect
, which makes it look like a method with error handling for failed allocations, and i agree.
If we want to bikeshed on possible method names, may i suggest either:
collect_ok()
(which parallels itertools' extremely-useful fold_ok()
), orcollect_results()
(which ironically parallels itertools' fold_options()
)I personally prefer the latter. It just looks vanilla and obvious :)
While the name is being bikeshed, here's my two cents:
I would prefer some other name than try_collect
. When I searched for "rust try_collect" I was looking for something to turn an Iterator
directly into a [T; N]
without having to use the TryInto
impl on Vec
.
I think this intuition is because of how there is the infallible Into/From
, and the fallible TryInto/TryFrom
. Thus there is the infallible FromIterator/Iterator::collect
and the fallible TryFromIterator/Iterator::try_collect
. Of course, TryFromIterator
doesn't exist, and it would be fairly niche, but there has been some very mild interest previously expressed in having TryFromIterator
https://github.com/rust-lang/rfcs/issues/1839
As for the alternative name, I don't have a strong preference. Either of epidemian's suggestions are fine.
Since try_collect
use the Try
trait it's not necessary a Result
for this reason I'm opposed to _result
or _ok
Which is why I suggested the name collect_try
in the first place way back when
Feature gate:
#![feature(iterator_try_collect)]
This is a tracking issue for adding the
try_collect()
method to theIterator
trait, as originally discussed here.Iterator::try_collect()
is a fallible variation ofIterator::collect()
analogous to similar methods forreduce
andfold
, among others, that provides a simpler, generic way to collect iterators ofTry
elements intoTry
-wrapped types.Difference from
Iterator::collect
In terms of functionality, the main difference between
try_collect()
andcollect()
is thattry_collect()
allows you to fallibly collect iterators yielding types that implementTry
but notFromIterator
, whichcollect()
can't do. Concretely this means you cantry_collect()
iterators yieldingControlFlow<_, i32>
intoControlFlow<_, Vec<i32>>
, which you can't do withcollect()
.It's also a
&mut self
method instead ofcollect
'sself
, so that you can resume iteration after it early exits on an error. (Like howtry_fold
is&mut self
whilefold
is justself
.)Another benefit of
try_collect()
is discoverability. Because the discoverability of the "collect-into-Result
" appears to be lower than desired, "promoting" the technique into its own method seems like a good way to increase its discoverability and reach more users who would find it useful. (One of many examples of people just not being able to find theOption: FromIterator
themselves: https://users.rust-lang.org/t/try-collect-as-iter-consuming-operation/20479?u=scottmcm.)Finally,
try_collect()
presents a small ergonomics benefit in terms of type hints when collecting, namely that you only have to hint at the output type you're collecting into, not the wholeTry
type. For example, that means you can collect an iterator yieldingResult<i32, SomeComplicatedError>
withtry_collect::<Vec<_>>()
, as opposed to having to specify the wholeResult
withcollect::<Result<Vec<_>, _>>()
.Public API
Steps / History
Unresolved Questions
self
rather than&mut self
, to prevent users from accidentally continuing to use the iterator after atry_collect()
failure? Note that you can still continue to use the iterator if you useby_ref()
first, so it's not necessarily a functionality change.try_collect()
conflict too much with the idea of collecting that's fallible in allocation? (i.e. collecting withVec::try_reserve
or similar)