Tracking Issue for `Iterator::try_collect`

a-lafrance commented 2 years ago

Feature gate: #![feature(iterator_try_collect)]

This is a tracking issue for adding the try_collect() method to the Iterator trait, as originally discussed here. Iterator::try_collect() is a fallible variation of Iterator::collect() analogous to similar methods for reduce and fold, among others, that provides a simpler, generic way to collect iterators of Try elements into Try-wrapped types.

Difference from `Iterator::collect`

In terms of functionality, the main difference between try_collect() and collect() is that try_collect() allows you to fallibly collect iterators yielding types that implement Try but not FromIterator, which collect() can't do. Concretely this means you can try_collect() iterators yielding ControlFlow<_, i32> into ControlFlow<_, Vec<i32>>, which you can't do with collect().

It's also a &mut self method instead of collect's self, so that you can resume iteration after it early exits on an error. (Like how try_fold is &mut self while fold is just self.)

Another benefit of try_collect() is discoverability. Because the discoverability of the "collect-into-Result" appears to be lower than desired, "promoting" the technique into its own method seems like a good way to increase its discoverability and reach more users who would find it useful. (One of many examples of people just not being able to find the Option: FromIterator themselves: https://users.rust-lang.org/t/try-collect-as-iter-consuming-operation/20479?u=scottmcm.)

Finally, try_collect() presents a small ergonomics benefit in terms of type hints when collecting, namely that you only have to hint at the output type you're collecting into, not the whole Try type. For example, that means you can collect an iterator yielding Result<i32, SomeComplicatedError> with try_collect::<Vec<_>>(), as opposed to having to specify the whole Result with collect::<Result<Vec<_>, _>>().

Public API

trait Iterator {
    type Item;

    fn try_collect<B>(&mut self) -> ChangeOutputType<Self::Item, B>
    where
        Self: Sized,
        <Self as Iterator>::Item: Try,
        <<Self as Iterator>::Item as Try>::Residual: Residual<B>,
        B: FromIterator<<Self::Item as Try>::Output>;
}

Steps / History

[x] Implementation: #94041
[ ] Final comment period (FCP)
[ ] Stabilization PR

Unresolved Questions

[ ] Should it have a more complicated signature to be able to return the partial results too? (@CAD97 https://internals.rust-lang.org/t/idea-fallible-iterator-mapping-with-try-map/15715/6?u=scottmcm )
[ ] Should it take self rather than &mut self, to prevent users from accidentally continuing to use the iterator after a try_collect() failure? Note that you can still continue to use the iterator if you use by_ref() first, so it's not necessarily a functionality change.
[ ] Does the name try_collect() conflict too much with the idea of collecting that's fallible in allocation? (i.e. collecting with Vec::try_reserve or similar)

ChayimFriedman2 commented 2 years ago

The name try_collect() may become a problem if one day we would want a collect()-like method with fallible allocations (like e.g. Vec::try_reserve()).

hellow554 commented 2 years ago

Maybe I'm blind and I can't find it, but you can already to something like this:

fn main() {
    let a = vec![Some(3), Some(4)];
    let b = vec![Some(3), None];
    let c = vec![Result::<_, ()>::Ok(3), Ok(4)];
    let d = vec![Ok(3), Err(())];

    println!("{:?}", a.into_iter().collect::<Option<Vec<_>>>());
    println!("{:?}", b.into_iter().collect::<Option<Vec<_>>>());
    println!("{:?}", c.into_iter().collect::<Result<Vec<_>, _>>());
    println!("{:?}", d.into_iter().collect::<Result<Vec<_>, _>>());
}

What is the difference to try_collect?

ChayimFriedman2 commented 2 years ago

IIRC the point was to ease the discovery of the method.

Xiretza commented 2 years ago

The name try_collect() may become a problem if one day we would want a collect()-like method with fallible allocations (like e.g. Vec::try_reserve()).

Right, the name of this function made me think it was about collecting regular (non-Try) values into things like arrays or fallibly allocated Vecs.

a-lafrance commented 2 years ago

What is the difference to try_collect?

@hellow554 the main differences are:

Discoverability (as @ChayimFriedman2 pointed out): It seems like the discoverability of this technique is lower than ideal, so adding a try_collect() helper is meant to make the technique more discoverable by "promoting" it into its own method.
Try but not FromIterator types: This is the main functional difference between try_collect() and collect(). collect() requires the Try type involved to also implement FromIterator, which is fine for simple cases like Option and Result, but doesn't allow other "Try-not-FromIterator" types like ControlFlow to be fallibly collected. In contrast, try_collect() only requires that the output type being collected into implements FromIterator, so I can try_collect() an iterator yielding ControlFlow<_, i32> into a ControlFlow<_, Vec<i32>>, which there's no way to do with collect().
Resuming after failure: collect() takes ownership of the iterator, so you can't continue to use it after a failure occurs when collecting. try_collect() mutably borrows the iterator, so even after a failure occurs, you can continue to perform iteration (e.g. you can call try_collect() again).
Type hints: This is a relatively minor benefit, but you don't have to hint the whole complicated Try type when using try_collect() like you do with collect(). Concretely, if I have an iterator yielding Result<i32, SomeComplicatedError>, I can call try_collect::<Vec<_>>() to perform fallible collection, but I have to call collect::<Result<Vec<_>, SomeComplicatedError>() to do the same thing.

Perhaps I wasn't clear enough about the difference between try_collect() and collect() in my original description of the feature. If it would make things more clear, I'm happy to modify the feature description up above to get the points I just mentioned across.

The name try_collect() may become a problem if one day we would want a collect()-like method with fallible allocations (like e.g. Vec::try_reserve()).

Yeah, I see how that might become a conflict; I guess there's two distinct ways that collecting can fail. If anyone has suggestions for method names that are clearer, I'd love to hear them.

TrolledWoods commented 2 years ago

On the point about collect consuming the iterator, I think you could already do .by_ref().collect(). This might be useful to prevent mistakenly using the iterator again, the by_ref makes sure the programmer actually wanted to continue using the iterator. So would it be a benefit for try_collect to also consume the iterator and then mention using by_ref in the docs if you actually want to resume using it?

hellow554 commented 2 years ago

@a-lafrance thanks for that explanation. That helps me understanding your intent. And yes, IMHO that should be in the original post as a description :)

@TrolledWoods That was my first thought too, why don't you just use by_ref. Then I looked at the other three functions and they are each taking &mut self. I mean... it is what is is, but self would have been better IMHO :) But now let's stick to that. I doesn't really matter, does it?

ChayimFriedman2 commented 2 years ago

@hellow554 If there is an argument for them to take self by reference, fine. But we should not repeat past mistakes in new APIs 😃

a-lafrance commented 2 years ago

Yeah, given that the by_ref() technique exists I can definitely see the desire to take self rather than &mut self, so I'm definitely willing to consider that change. I guess since the PR is already landing we can leave it as-is for now and revisit it later, so I'll add it as an unresolved question for now.

scottmcm commented 2 years ago

Should it take self rather than &mut self, to prevent users from accidentally continuing to use the iterator after a try_collect() failure? Note that you can still continue to use the iterator if you use by_ref() first, so it's not necessarily a functionality change.

Note that by_ref is a huge performance footgun, since it prevents using internal iteration, as it uses the &mut I where I: Iterator impl that, because it works for dyn Iterator, can't use any of the generics. The benchmarks in core can be used to see the extra cost: https://github.com/rust-lang/rust/blob/f838a425e3134d036a7d9632935111a569ac7446/library/core/benches/iter.rs#L150-L158

It's absolutely by design that anything which can early-exit takes &mut self. That's why all and any take &mut self, for example, while count and max and such don't. And because all takes &mut self, that means try_fold needs to as well in order to meet its primary purpose (#45595) of being the one internal iteration method in terms of which every single Iterator method can be defined.

There's nothing wrong with resuming iteration after getting an error from try_collect. And sure, you can do that wrong and try to resume iteration after getting a success from try_collect on a non-fused iterator, but that's exactly the same mistake as resuming iteration after getting None from .next(), so that seems fine for the same reasons.

a-lafrance commented 2 years ago

Oh that's an interesting point @scottmcm. I was wondering why some methods took self while others took &mut self so this clears it up for me. In that case I'm personally in favor of continuing to take &mut self rather than self.

ChayimFriedman2 commented 2 years ago

Note that by_ref is a huge performance footgun, since it prevents using internal iteration, as it uses the &mut I where I: Iterator impl that, because it works for dyn Iterator, can't use any of the generics.

Can this situation be improved using specialization? I.e. specialize impl<I: Sized> Iterator for &mut I to use internal iteration where possible? If not, I think this definitely should be mentioned on the docs, since it is not at all obvious.

TrolledWoods commented 2 years ago

Thinking about it more there's also the argument that &mut self signifies that it's intended to sometimes resume the value, whereas if it's self by value, you should expect weird behaviour when resuming, so it may be a better idea, yeah. Surprising that by_ref is a performance footgun though, that feels like something that should be fixed asap

scottmcm commented 2 years ago

I probably shouldn't have used a word as strong as "footgun" for this. If you're just iterating over, say, a slice, then it makes no difference at all.

It's only in certain cases -- Chain being the usual example -- where it matters. And even then only when the body of the loop is tiny. As soon as the body is doing anything substantial, any difference immediately disappears in the noise.

So it's very important in core, where we make sure to avoid pessimizing code. But outside of that it's often unimportant, and the knowledge lives in things like that blog post I linked. I also don't want people to overindex on this and start telling anyone using a for loop or by_ref that they're "doing it wrong" because of a note in the docs either.

I don't know whether specialization in currently in a place where it could fix it.

a-lafrance commented 2 years ago

Just a thought: I was looking back at Rust by Example recently and I noticed that this section mentions fallible collection with Iterator::collect. Would it be worth updating to include try_collect upon stabilization? Just so that people are aware of the new technique that we've introduced.

LoganDark commented 2 years ago

offtopic: This is the first unstable feature I have ever encountered that has any business still being unstable (i.e. has had discussion in the last 5 years). Bravo.

TennyZhuang commented 2 years ago

I'd like to share a benefit from try_collect vs collect::<Result<?, ?>>. try_collect allows me to hide the concrete type of the Result::E type, which is convenient when we want to split an error from a more extensive scope to a smaller one. If we use try_collect, we don't need to change every code that uses which Result, and ? will do the implicit cast from the low-level error to the high-level one.

An example https://github.com/singularity-data/risingwave/pull/3081

cryptoquick commented 2 years ago

Maybe I'm blind and I can't find it, but you can already to something like this:

fn main() {
    let a = vec![Some(3), Some(4)];
    let b = vec![Some(3), None];
    let c = vec![Result::<_, ()>::Ok(3), Ok(4)];
    let d = vec![Ok(3), Err(())];

    println!("{:?}", a.into_iter().collect::<Option<Vec<_>>>());
    println!("{:?}", b.into_iter().collect::<Option<Vec<_>>>());
    println!("{:?}", c.into_iter().collect::<Result<Vec<_>, _>>());
    println!("{:?}", d.into_iter().collect::<Result<Vec<_>, _>>());
}

What is the difference to try_collect?

First of all, thank you for this, I needed that snippet. Also, if this ever gets merged, it'd be nice to have a clippy lint so I don't have to remember to remove where I've copied and pasted that snippet all over my codebase. :3

rben01 commented 2 years ago

The name try_collect() may become a problem if one day we would want a collect()-like method with fallible allocations (like e.g. Vec::try_reserve()).

Right, the name of this function made me think it was about collecting regular (non-Try) values into things like arrays or fallibly allocated Vecs.

I think the iterator methods prefixed with try_ can all be (and should continue to be) taken to be mean "this iterator method short-circuits with the first error encountered". The try_ in Vec::try_reserve is like the try_ in Rc::try_new and array::try_from — it simply means "this thing might not work out". Since the proposed method is/would be defined on iterators and not more general Rust objects/types, I think try_collect is the correct name for it. (One could say that the iterator-specific meaning of the try_ prefix "shadows" the language-wide meaning.)

If collecting can fail due to a TryReserveError — which AFAIK is the only way collecting into a FromIterator could fail — then I think the appropriate name for that method would be something like collect_try_reserve. And if more general errors need to be handled then I think a TryFromIterator trait would be appropriate.

Xiretza commented 1 year ago

If collecting can fail due to a TryReserveError — which AFAIK is the only way collecting into a FromIterator could fail — then I think the appropriate name for that method would be something like collect_try_reserve. And if more general errors need to be handled then I think a TryFromIterator trait would be appropriate.

It's not specific to reservation of Vecs at all, there are many data structures that would benefit from a fallible FromIterator and such a try_collect() method: arrays (as previously mentioned) are an obvious example, but another one would be a Grid type that can be constructed from an iterator of iterators (the rows of the grid), and which would fail if one of the rows has a different length from the others.

ilyvion commented 1 year ago

I don't like the name try_collect for this; it doesn't fit the rest of the standard library API; i.e. it doesn't use a TryFromIterator trait, but rather a FromIterator over items that implement Try, which is a different idea.

Based on its name, I would've expected try_collect to behave similarly to how try_into() can be used where it only succeeds where the source can be converted to the target; i.e. I'd expect to be able to try_collect() into a [T; 10], e.g., where it would only succeed if the iterator had exactly 10 items.

This seems like an unnecessary generalization of the idea of being able to collect() to Result/Option, which already exists today and similar implementations could be added to any new type that implements Try in the future without "stealing" the try_collect name for a different purpose than (at least I think) seems the most natural -- i.e. fallible collection, not collecting fallible items.

It might seem a more clunky name, but this seems like it should be named collect_try rather than try_collect, if it should exist at all.

withoutboats commented 1 year ago

(NOT A CONTRIBUTION)

It is my strongly held position that this API should be removed from the standard library. Its usefulness is far far less than the cognitive load it brings and therefore it does not carry its weight.

The biggest issue with this API is that its signature is completely unreasonable. Return <<Self::Item as Try>::Residual as Residua<B>l>::TryType? Multiple associated type projections to perform type changing like this is not a pattern that should be accepted into std without extremely compelling motivation for the simple reason that is totally inscrutable to most users.

And the motivation to add this API to std is not compelling at all. From the API docs and this thread I see:

Types that implement Try but not FromIterator can be collected into (the overwhelmingly most common Try types, Result and Option, do implement FromIterator).
You don't have to specify Result with try_collect whereas you do with collect.
Mutably borrowing the iterator, instead of taking ownership of it, so you can continue iterating after.

Remember that collect is just a convenient sugar over a for loop. For that final stated advantage, I would strongly prefer just writing a for loop that doesn't short circuit on the error case.

When contrasting these benefits with the signature of this function, I do not have any doubt that this is not a motivation that justifies a signature like this. When you consider other problems brought up in this thread - like that it takes a name that could plausibly useful for a fallibly allocating collect API, which would presumably have a simpler signature and a more compelling motivation - I am honestly shocked that this API has been added to std.

Jules-Bertholet commented 1 year ago

The biggest issue with this API is that its signature is completely unreasonable. Return <<Self::Item as Try>::Residual as Residual<B>>::TryType? Multiple associated type projections to perform type changing like this is not a pattern that should be accepted into std without extremely compelling motivation for the simple reason that is totally inscrutable to most users.

The Try trait is still unstable, it may be that the final design won't be so inscrutable. I would suspend judgment until Try is stabilized.

Another potential issue is that a hypothetical faillibly-allocating collection function might want to be called try_collect.

WhyNotHugo commented 1 year ago

What use cases are served by Iterator::try_collect which aren't already covered by Result::from_iter?

obaraelijah commented 3 months ago

The try_collect method allows you to collect results from an iterator, where each item in the iterator is a Result. If an error occurs during the iteration, try_collect will return the error immediately. Which is useful in situations where you have a collection of results and you want to handle all the errors together. By using it you can avoid having to manually handle errors in the loop that processes the results.

epidemian commented 2 months ago

What use cases are served by Iterator::try_collect which aren't already covered by Result::from_iter?

@WhyNotHugo I think the docs on itertools's try_collect summarize it quite well:

.try_collect() is more convenient way of writing .collect::<Result<_, _>>()

It's just a convenience alias :)

If Iterator::collect() were able to infer that we want to collect into a Result and allow the use of ? short-circuiting immediately after, there'd be no need try_collect i think.

But, for example, this doesn't compile:

fn process_data(data: &[&str]) -> anyhow::Result<()> {
    let numbers: Vec<u64> = data.iter().map(|s| s.parse()).collect()?;
    println!("processing {numbers:?}");
    Ok(())
}

(Imagine the data parameter being anything iterable, and s.parse() being any fallible operation.)

We just want to collect numbers into a vector, and propagate any parsing errors up.

It doesn't work even if we change the return type of the function to explicitly be Result<(), std::num::ParseIntError>, such that the ? doesn't need to do any type conversions.

Some alternatives that do compile are:

Specifying that we want to collect into a Result via a turbofish type param:
```
let numbers: Vec<u64> = data.iter().map(|s| s.parse()).collect::<Result<_, _>>()?;
```
Which is kind of noisy. Why do we need to say we're collecting into a Result if we're immediately propagating it up via the ? that follows that .collect()?
Or also including the container type on the turbofish type parameter:
```
let numbers = data.iter().map(|s| s.parse()).collect::<Result<Vec<u64>, _>>()?;
```
Which is as noisy as the previous option, but with the added drawback that in order to know the type of numbers you have to chase it all the way to the end of that expression.
Using Result::from_iter() instead of collect():
```
let numbers: Vec<u64> = Result::from_iter(data.iter().map(|s| s.parse()))?;
```
Which is kind of unidiomatic (even the docs for Result::from_iter() use .collect() instead)
Separating the collection and the Result propagation into two different lines:
```
let parse_results: Result<_, _> = data.iter().map(|s| s.parse()).collect();
let numbers: Vec<u64> = parse_results?;
```
This one is not too bad TBH. Though i'd much rather prefer a readable one-liner :)

But none of these options are as readable as the first snippet IMO. And that's the appeal of try_collect().

Using itertools' try_collect() instead of collect() on the first snippet compiles just fine:

let numbers: Vec<u64> = data.iter().map(|s| s.parse()).try_collect()?;

epidemian commented 2 months ago

BTW, i understand there's a naming issue with try_collect, which makes it look like a method with error handling for failed allocations, and i agree.

If we want to bikeshed on possible method names, may i suggest either:

collect_ok() (which parallels itertools' extremely-useful fold_ok()), or
collect_results() (which ironically parallels itertools' fold_options())

I personally prefer the latter. It just looks vanilla and obvious :)

MeGaGiGaGon commented 1 month ago

While the name is being bikeshed, here's my two cents:

I would prefer some other name than try_collect. When I searched for "rust try_collect" I was looking for something to turn an Iterator directly into a [T; N] without having to use the TryInto impl on Vec.

I think this intuition is because of how there is the infallible Into/From, and the fallible TryInto/TryFrom. Thus there is the infallible FromIterator/Iterator::collect and the fallible TryFromIterator/Iterator::try_collect. Of course, TryFromIterator doesn't exist, and it would be fairly niche, but there has been some very mild interest previously expressed in having TryFromIterator https://github.com/rust-lang/rfcs/issues/1839

As for the alternative name, I don't have a strong preference. Either of epidemian's suggestions are fine.

Stargateur commented 1 month ago

Since try_collect use the Try trait it's not necessary a Result for this reason I'm opposed to _result or _ok

ilyvion commented 1 month ago

Which is why I suggested the name collect_try in the first place way back when

rust-lang / rust