rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.74k stars 12.64k forks source link

Tracking issue for Iterator::flatten #48213

Closed Centril closed 6 years ago

Centril commented 6 years ago

Tracking issue for #48115, Iterator::flatten and FlatMap.

leonardo-m commented 6 years ago

A flatten could be useful, but I think there are more important/useful functions/iterators to add before a flatten.

clarfonthey commented 6 years ago

@leonardo-m Clearly not important enough for someone to write a PR/RFC before this one, though. ;)

clarfonthey commented 6 years ago

Also I can't list the number of times I've done .flat_map(|x| x), so, I'm glad this was implemented.

One thing that I think should be clarified is that this only flattens one level, i.e. it will not transform [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] into [1, 2, 3, 4, 5, 6, 7, 8] but [[1, 2], [3, 4], [5, 6], [7, 8]].

Centril commented 6 years ago

@leonardo-m I don't think it has to be an either-or proposition - if there are more useful additions, and there certainly are some from Itertools, they should be made irrespective of this one. Someone just has to write the PRs for those =)

Regarding .flatten() specifically, I believe it gives us easier-to-express specialization opportunities than .flat_map() does such as, flattening a vec::IntoIter<Vec<_>> with a single allocation. Also, it was mentioned explicitly that neither of .flat_map(|x| x) nor .flat_map(identity) were rather nice and that .flatten() was clearer at https://github.com/rust-lang/rfcs/pull/2306#issuecomment-361391370, https://github.com/rust-lang/rfcs/pull/2306#issuecomment-361389638, and https://github.com/rust-lang/rfcs/pull/2306#pullrequestreview-90382389.

@clarcharr hmm, it does say in the docs that:

and you want to remove one level of indirection.

Perhaps that's not clear enough? Can we fix this during stabilization perhaps if it is not?

clarfonthey commented 6 years ago

@Centril I mentioned that having not actually read the docs yet. ;)

I honestly think that including an example would be clearer; there are tools for languages such as JavaScript, for example, which will completely flatten an array regardless of how many levels it has. Although such a thing would be very difficult and/or impossible to do with Rust's type system, I think it'd be best for it to be clear that this is not a "deep" flatten, just a shallow one.

Centril commented 6 years ago

@clarcharr Updated the docs with your example.

clarfonthey commented 6 years ago

Awesome! Thanks

yrashk commented 6 years ago

Not sure, how important this angle is, but just in case -- it looks like this introduces a failure for nightly builds on code that relies on itertools' version of flatten:

https://it.sit-it.org/issue/fe5e68e5-22a1-4bc3-8ebf-36586460ba27 https://travis-ci.org/sit-it/sit/jobs/346132639#L1129

It's a trivial fix for the code that uses it (the patch is in the first link) but I thought it is worth mentioning this aspect.

eira-fransham commented 6 years ago

I don't think arbitrary-level flattening is impossible with Rust's type system, it's just that it would rely on specialisation and could produce confusing results.

For example, [[Some(1), None], [None, None]].deep_flatten() would produce [1], obvious to a Rust veteran but maybe unintuitive for a newbie, plus implementing Iterator for a type would massively change the meaning of the expression.

Also, this would cause an infinite loop in the compiler when resolving the output type of deep_flatten:

struct Foo;

impl Iterator for Foo {
    type Item = Foo;

    fn next(&mut self) -> Foo {
        unimplemented!()
    }
}

JS is dynamically typed and so deep flattening is useful since you don't know the nesting level until runtime. With Rust you know ahead of time precisely how many levels deep an iterator will be nested.

SoniEx2 commented 6 years ago

Perhaps mention that this is more useful with scan instead of map? https://bitbucket.org/SoniEx2/wcc-rs/src/e07c866224a84accda05fa7fd4d956a185daabf7/src/lib.rs?at=master&fileviewer=file-view-default#lib.rs-147:157

Centril commented 6 years ago

@SoniEx2 I think the documentation has a sufficient number of examples, more would be too many.

leonardo-m commented 6 years ago

@leonardo-m I don't think it has to be an either-or proposition - if there are more useful additions, and there certainly are some from Itertools, they should be made irrespective of this one. Someone just has to write the PRs for those =)

I don't agree. So far in my whole codebase I have seen zero places where to use a flatten(), on the other hand from the standard library I miss several iterable things already present in itertools. The API of the std library should be kept small. Adding functions is not free, the more things you add, the more time you need to use the API and find the right function. In my opinion the whole business of adding iterators to the Rust std library is currently broken. It should be based on real data. What are the most used iterators of the itertools crate? Add those to the std library.

SoniEx2 commented 6 years ago

I have written scan+flat_map(|x|x) at least 3 times so far. And I don't even use Rust that often yet!

The more I use rust, the more I run into scan+flatten.

See also: https://github.com/rust-lang/rfcs/issues/1978

It's most useful for string parsing, and partial collects.

Centril commented 6 years ago

@SimonSapin This has been in nightly for a few months now; could we stabilize?

kennytm commented 6 years ago

We may want another (check-only) crater run before stabilizing due to conflict with Itertools::flatten after that.

clarfonthey commented 6 years ago

Perhaps itertools could do what serde does now to detect new Rust versions and have a build script which automatically aliases the builtin flatten on Rust versions which support it? That way it could be released as a semver-compatible change, but the breakages would go away.

SimonSapin commented 6 years ago

@Centril It seems that the next step would be a stabilization PR so that we can do that Crater run.

reuvenpo commented 6 years ago

Is there a real argument against generalizing flatten over the level of nesting somehow? It can be very useful for purposes where one has a deeply nested structure.

Perhaps provide a full_flatten adapter as well as a shallow flatten?

Maybe call the current adapter shallow_flatten instead?

SimonSapin commented 6 years ago

@reuvenpo How would that work exactly? As far as I know the type system cannot express this sort of type-level recursion (and boundary condition).

reuvenpo commented 6 years ago

To be honest, I'm still fairly new to rust and found my way here looking for this sort of feature, so I don't know for certain whether this sort of thing is possible.

But at least theoretically, there exists a type system not much different from the subset of the rust type system that I'm familiar with, which would be able to statically generate concrete functions that flatten a nested structure, based on its type.

As you said, that would require type-level recursion with a stop condition.

I was hoping that someone with more in-depth knowledge of the current type system than myself would come across this thread and determine if it's possible or not.

Centril commented 6 years ago

In either case, whether it is possible or not, I don't think a nested flatten operation should be called .flatten() since it would lead to surprises. Given that .map corresponds to functor mapping and that .flat_map corresponds to monadic bind, then .flatten should correspond to monadic join.

reuvenpo commented 6 years ago

I've been thinking about this issue for a while, and i came up with something, but sadly i don't currently have the time to try a full implementation, so I'll write down a sketch of it, maybe it makes some sense (If it does I'll invest a few hours into learning how to properly contribute over the weekend).

Basically, have a trait Nested with a method deep_flatten<I: Iterator>(self) -> I (or similar, i hope you get the idea) which returns the final iterator. This trait can be implemented for all iterators to just return self. That is, flattening a "flat" iterator does nothing. For any iterator of iterators (exact type signature missing for lack of author's experience), as a specialization, this trait can be implemented to return an iterator that collects its items from calling deep_flatten on its sub-iterators. If i understand the type system correctly, this should be equivalent to the "type-level recursion with a stop condition" discussed earlier.

Centril commented 6 years ago

I did some experimentation: https://play.rust-lang.org/?gist=1bbdca273415c488e2f058d39fe995b2&version=nightly&mode=debug

SimonSapin commented 6 years ago

@reuvenpo That sounds maybe plausible with specialization of a trait that has an associated type (for the returned iterator). But the design of the trait specialization feature itself in the language is still evolving, so it might be a long time before "deep flatten" is possible at all.

In the mean time, the existing Iterator::flatten can be stabilized very soon. I feel that the potential of a similar-but-different feature to maybe be added later is not enough to block stabilization now with the shorter name.

reuvenpo commented 6 years ago

Would it be appropriate to open a tracking issue for Iterator::unnest (name taken from @Centril 's earlier gist) to discuss possible implementations of this idea?

Or should an RFC be made first?

eira-fransham commented 6 years ago

Generally it's considered bad practice to have the behaviour rely on specialisation (rather than specialisation being an optimisation) because it means that Box::new(some_val) as Box<Trait> can act differently to just some_val depending on the circumstance.

In JavaScript the unnest method would be useful since you don't know until runtime the level of nesting (and different elements can nest different depths). Essentially, unnest is a tree-flattening operation in a dynamically-typed language. In a strongly-, strictly-typed language like Rust there is very little use for a method like unnest because the tree depth is known at compile-time. I don't mean to say this in a rude way, but I can't think of a place where this would be useful. Of course, if you can find an example of where this would be useful then I would like to see it.

reuvenpo commented 6 years ago

@Vurich, i'll start by addressing the second part of your message, and then elaborate on it, hopefully answering the first part too.

It's true that in dynamically typed languages this is useful because both the author and the language may not know the depth of nesting for a given object, but it doesn't necessarily mean that in a statically typed language the author would like to care about the level of nesting of a given object, in some contexts.

My example would be anything resembling a tree structure with unbounded depth (For example a directory structure). It may be desirable to have a function generalizes over Nested<String> (A trait) which provides a method unnest returning an iterator (presumably of type Unnest) over all the leaf-nodes in that tree directly(i.e. the names of plain files) instead of manually unnesting the structure.

In this context, say you had a function that received a structure whose concrete type was something like Vec<Vec<Vec<Vec<i32>>>>(but by some generalization it wasn't necessarily known how many layers of vectors there were), but the function would just like to iterate over the i32s inside. That's impossible to do right now (AFAIK). What if instead this function wanted to iterate over Vec<i32>s, or Vec<Vec<i32>>s, etc?

The type information is all there, it's just impossible to use currently, for these purposes.

It would be best if you couldn't call thing.unnest() on a concrete object, but rather had to specify exactly to which level of unnesting you wanted to reach(e.g. thing.unnest::<String>()), but if you were in a generic function where thing was already bound by a Nested<SomeThing>, then you could just unnest() it.

Under this system, Box::new(some_val) as Box<Trait> wouldn't act that differently differently than some_val. If Trait was Nested<Vec<i32>> but some_val was Vec<Vec<Vec<i32>>> then you couldn't call unnest::<Vec<Vec<i32>>>() on the trait-object anymore, but you could still call unnest::<Vec<i32>>() and unnest::<i32>() on it, plus you can now call unnest() on it, which would be equivalent to calling unnest::<Vec<i32>>(). This behavior would make sense in this context, restricting the object's behavior according to the trait bound, and extending it's behavior slightly by the same bound.

Of course, this is all pretty difficult, if at all possible, to implement today. I certainly don't know the "right" way to do this, if it was possible, I just think this interface makes sense and is doable, even if it still requires some features that aren't yet stabilized, or available at all.

P.S. Obviously for cases where the concrete type is known, one can just call .flatten() the necessary number of times to reach the desired type.

EDIT 1: changed Unnestable to Nested

SoniEx2 commented 6 years ago

"unnestable" is weird for a trait name. it sounds like you can't put it inside stuff.

reuvenpo commented 6 years ago

The name is just a proposal too. It can be called anything else that makes sense.

(Denestable? Nested? Unnesting? I like Nested better now that i think of it.)

eira-fransham commented 6 years ago

The design where you specify the iterator type still fails for this:

struct Foo;

impl Iterator for Foo {
    type Item = Foo;

    fn next(&mut self) -> Foo {
        Foo
    }
}
reuvenpo commented 6 years ago

I'm not sure i understand. Do you mean iterators of a type that always yield iterators of the same type?

(Is that even a useful pattern?)

SoniEx2 commented 6 years ago

It would (should) do the least amount of work to return the desired result.

You want an Iterator\? Easy, one operation. You want an Foo? Zero operations. You want Iterator\<Item=Iterator\>? Also mostly easy. Etc.

reuvenpo commented 6 years ago

I don't see where i would use this pattern (since it prevents Foo from being an iterator over anything else, and this is a weird recursive way of doing what std::iter::repeat(...) does) but i agree that in this case a Foo can be considered as a Nested<Foo>, Nested<Iterator<Item=Foo>>, etc, but trying to .unnest() any of these constructs would result in an infinite recursion when trying to yield the first item, because of the recursive definition of Iterator for Foo

But again... What is the use case? Would in that use case it be useful to treat a Foo as a Nested<Foo>?

Centril commented 6 years ago

Can we move discussion on flattening nested stuff to internals.rust-lang.org?

reuvenpo commented 6 years ago

(I'll have to open an account, but) Sure