Closed Centril closed 6 years ago
A flatten could be useful, but I think there are more important/useful functions/iterators to add before a flatten.
@leonardo-m Clearly not important enough for someone to write a PR/RFC before this one, though. ;)
Also I can't list the number of times I've done .flat_map(|x| x)
, so, I'm glad this was implemented.
One thing that I think should be clarified is that this only flattens one level, i.e. it will not transform [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
into [1, 2, 3, 4, 5, 6, 7, 8]
but [[1, 2], [3, 4], [5, 6], [7, 8]]
.
@leonardo-m I don't think it has to be an either-or proposition - if there are more useful additions, and there certainly are some from Itertools, they should be made irrespective of this one. Someone just has to write the PRs for those =)
Regarding .flatten()
specifically, I believe it gives us easier-to-express specialization opportunities than .flat_map()
does such as, flattening a vec::IntoIter<Vec<_>>
with a single allocation.
Also, it was mentioned explicitly that neither of .flat_map(|x| x)
nor .flat_map(identity)
were rather nice and that .flatten()
was clearer at https://github.com/rust-lang/rfcs/pull/2306#issuecomment-361391370, https://github.com/rust-lang/rfcs/pull/2306#issuecomment-361389638, and https://github.com/rust-lang/rfcs/pull/2306#pullrequestreview-90382389.
@clarcharr hmm, it does say in the docs that:
and you want to remove one level of indirection.
Perhaps that's not clear enough? Can we fix this during stabilization perhaps if it is not?
@Centril I mentioned that having not actually read the docs yet. ;)
I honestly think that including an example would be clearer; there are tools for languages such as JavaScript, for example, which will completely flatten an array regardless of how many levels it has. Although such a thing would be very difficult and/or impossible to do with Rust's type system, I think it'd be best for it to be clear that this is not a "deep" flatten, just a shallow one.
@clarcharr Updated the docs with your example.
Awesome! Thanks
Not sure, how important this angle is, but just in case -- it looks like this introduces a failure for nightly builds on code that relies on itertools' version of flatten:
https://it.sit-it.org/issue/fe5e68e5-22a1-4bc3-8ebf-36586460ba27 https://travis-ci.org/sit-it/sit/jobs/346132639#L1129
It's a trivial fix for the code that uses it (the patch is in the first link) but I thought it is worth mentioning this aspect.
I don't think arbitrary-level flattening is impossible with Rust's type system, it's just that it would rely on specialisation and could produce confusing results.
For example, [[Some(1), None], [None, None]].deep_flatten()
would produce [1]
, obvious to a Rust veteran but maybe unintuitive for a newbie, plus implementing Iterator
for a type would massively change the meaning of the expression.
Also, this would cause an infinite loop in the compiler when resolving the output type of deep_flatten
:
struct Foo;
impl Iterator for Foo {
type Item = Foo;
fn next(&mut self) -> Foo {
unimplemented!()
}
}
JS is dynamically typed and so deep flattening is useful since you don't know the nesting level until runtime. With Rust you know ahead of time precisely how many levels deep an iterator will be nested.
Perhaps mention that this is more useful with scan instead of map? https://bitbucket.org/SoniEx2/wcc-rs/src/e07c866224a84accda05fa7fd4d956a185daabf7/src/lib.rs?at=master&fileviewer=file-view-default#lib.rs-147:157
@SoniEx2 I think the documentation has a sufficient number of examples, more would be too many.
@leonardo-m I don't think it has to be an either-or proposition - if there are more useful additions, and there certainly are some from Itertools, they should be made irrespective of this one. Someone just has to write the PRs for those =)
I don't agree. So far in my whole codebase I have seen zero places where to use a flatten(), on the other hand from the standard library I miss several iterable things already present in itertools. The API of the std library should be kept small. Adding functions is not free, the more things you add, the more time you need to use the API and find the right function. In my opinion the whole business of adding iterators to the Rust std library is currently broken. It should be based on real data. What are the most used iterators of the itertools crate? Add those to the std library.
I have written scan+flat_map(|x|x) at least 3 times so far. And I don't even use Rust that often yet!
The more I use rust, the more I run into scan+flatten.
See also: https://github.com/rust-lang/rfcs/issues/1978
It's most useful for string parsing, and partial collects.
@SimonSapin This has been in nightly for a few months now; could we stabilize?
We may want another (check-only) crater run before stabilizing due to conflict with Itertools::flatten
after that.
Perhaps itertools
could do what serde
does now to detect new Rust versions and have a build script which automatically aliases the builtin flatten
on Rust versions which support it? That way it could be released as a semver-compatible change, but the breakages would go away.
@Centril It seems that the next step would be a stabilization PR so that we can do that Crater run.
Is there a real argument against generalizing flatten over the level of nesting somehow? It can be very useful for purposes where one has a deeply nested structure.
Perhaps provide a full_flatten
adapter as well as a shallow flatten
?
Maybe call the current adapter shallow_flatten
instead?
@reuvenpo How would that work exactly? As far as I know the type system cannot express this sort of type-level recursion (and boundary condition).
To be honest, I'm still fairly new to rust and found my way here looking for this sort of feature, so I don't know for certain whether this sort of thing is possible.
But at least theoretically, there exists a type system not much different from the subset of the rust type system that I'm familiar with, which would be able to statically generate concrete functions that flatten a nested structure, based on its type.
As you said, that would require type-level recursion with a stop condition.
I was hoping that someone with more in-depth knowledge of the current type system than myself would come across this thread and determine if it's possible or not.
In either case, whether it is possible or not, I don't think a nested flatten operation should be called .flatten()
since it would lead to surprises. Given that .map
corresponds to functor mapping and that .flat_map
corresponds to monadic bind, then .flatten
should correspond to monadic join.
I've been thinking about this issue for a while, and i came up with something, but sadly i don't currently have the time to try a full implementation, so I'll write down a sketch of it, maybe it makes some sense (If it does I'll invest a few hours into learning how to properly contribute over the weekend).
Basically, have a trait Nested
with a method deep_flatten<I: Iterator>(self) -> I
(or similar, i hope you get the idea) which returns the final iterator.
This trait can be implemented for all iterators to just return self. That is, flattening a "flat" iterator does nothing.
For any iterator of iterators (exact type signature missing for lack of author's experience), as a specialization, this trait can be implemented to return an iterator that collects its items from calling deep_flatten
on its sub-iterators.
If i understand the type system correctly, this should be equivalent to the "type-level recursion with a stop condition" discussed earlier.
I did some experimentation: https://play.rust-lang.org/?gist=1bbdca273415c488e2f058d39fe995b2&version=nightly&mode=debug
@reuvenpo That sounds maybe plausible with specialization of a trait that has an associated type (for the returned iterator). But the design of the trait specialization feature itself in the language is still evolving, so it might be a long time before "deep flatten" is possible at all.
In the mean time, the existing Iterator::flatten
can be stabilized very soon. I feel that the potential of a similar-but-different feature to maybe be added later is not enough to block stabilization now with the shorter name.
Would it be appropriate to open a tracking issue for Iterator::unnest
(name taken from @Centril 's earlier gist) to discuss possible implementations of this idea?
Or should an RFC be made first?
Generally it's considered bad practice to have the behaviour rely on specialisation (rather than specialisation being an optimisation) because it means that Box::new(some_val) as Box<Trait>
can act differently to just some_val
depending on the circumstance.
In JavaScript the unnest
method would be useful since you don't know until runtime the level of nesting (and different elements can nest different depths). Essentially, unnest
is a tree-flattening operation in a dynamically-typed language. In a strongly-, strictly-typed language like Rust there is very little use for a method like unnest
because the tree depth is known at compile-time. I don't mean to say this in a rude way, but I can't think of a place where this would be useful. Of course, if you can find an example of where this would be useful then I would like to see it.
@Vurich, i'll start by addressing the second part of your message, and then elaborate on it, hopefully answering the first part too.
It's true that in dynamically typed languages this is useful because both the author and the language may not know the depth of nesting for a given object, but it doesn't necessarily mean that in a statically typed language the author would like to care about the level of nesting of a given object, in some contexts.
My example would be anything resembling a tree structure with unbounded depth (For example a directory structure). It may be desirable to have a function generalizes over Nested<String>
(A trait) which provides a method unnest
returning an iterator (presumably of type Unnest
) over all the leaf-nodes in that tree directly(i.e. the names of plain files) instead of manually unnesting the structure.
In this context, say you had a function that received a structure whose concrete type was something like Vec<Vec<Vec<Vec<i32>>>>
(but by some generalization it wasn't necessarily known how many layers of vectors there were), but the function would just like to iterate over the i32
s inside. That's impossible to do right now (AFAIK). What if instead this function wanted to iterate over Vec<i32>
s, or Vec<Vec<i32>>
s, etc?
The type information is all there, it's just impossible to use currently, for these purposes.
It would be best if you couldn't call thing.unnest()
on a concrete object, but rather had to specify exactly to which level of unnesting you wanted to reach(e.g. thing.unnest::<String>()
), but if you were in a generic function where thing
was already bound by a Nested<SomeThing>
, then you could just unnest()
it.
Under this system, Box::new(some_val) as Box<Trait>
wouldn't act that differently differently than some_val
. If Trait
was Nested<Vec<i32>>
but some_val
was Vec<Vec<Vec<i32>>>
then you couldn't call unnest::<Vec<Vec<i32>>>()
on the trait-object anymore, but you could still call unnest::<Vec<i32>>()
and unnest::<i32>()
on it, plus you can now call unnest()
on it, which would be equivalent to calling unnest::<Vec<i32>>()
. This behavior would make sense in this context, restricting the object's behavior according to the trait bound, and extending it's behavior slightly by the same bound.
Of course, this is all pretty difficult, if at all possible, to implement today. I certainly don't know the "right" way to do this, if it was possible, I just think this interface makes sense and is doable, even if it still requires some features that aren't yet stabilized, or available at all.
P.S. Obviously for cases where the concrete type is known, one can just call .flatten()
the necessary number of times to reach the desired type.
EDIT 1: changed Unnestable
to Nested
"unnestable" is weird for a trait name. it sounds like you can't put it inside stuff.
The name is just a proposal too. It can be called anything else that makes sense.
(Denestable
? Nested
? Unnesting
? I like Nested
better now that i think of it.)
The design where you specify the iterator type still fails for this:
struct Foo;
impl Iterator for Foo {
type Item = Foo;
fn next(&mut self) -> Foo {
Foo
}
}
I'm not sure i understand. Do you mean iterators of a type that always yield iterators of the same type?
(Is that even a useful pattern?)
It would (should) do the least amount of work to return the desired result.
You want an Iterator\
I don't see where i would use this pattern (since it prevents Foo from being an iterator over anything else, and this is a weird recursive way of doing what std::iter::repeat(...)
does) but i agree that in this case a Foo
can be considered as a Nested<Foo>
, Nested<Iterator<Item=Foo>>
, etc, but trying to .unnest()
any of these constructs would result in an infinite recursion when trying to yield the first item, because of the recursive definition of Iterator for Foo
But again... What is the use case? Would in that use case it be useful to treat a Foo
as a Nested<Foo>
?
Can we move discussion on flattening nested stuff to internals.rust-lang.org?
(I'll have to open an account, but) Sure
Tracking issue for #48115,
Iterator::flatten
andFlatMap
.