rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.85k stars 12.51k forks source link

Tracking Issue for `Iterator::intersperse` #79524

Open camelid opened 3 years ago

camelid commented 3 years ago

This is a tracking issue for the Iterator::intersperse and intersperse_with methods. It is based on itertools::intersperse and is the iterator equivalent of Vec::join.

The feature gate for the issue is #![feature(iter_intersperse)].

Public API

// core::iter

pub trait Iterator {
    fn intersperse(self, separator: Self::Item) -> Intersperse<Self>
    where
        Self: Sized,
        Self::Item: Clone;

    fn intersperse_with<G>(self, separator: G) -> IntersperseWith<Self, G>
    where
        Self: Sized,
        G: FnMut() -> Self::Item;
}

#[derive(Debug, Clone)]
pub struct Intersperse<I: Iterator> where I::Item: Clone {}

impl<I> Iterator for Intersperse<I> where I: Iterator, I::Item: Clone {
    type Item = I::Item;
}

pub struct IntersperseWith<I, G> where I: Iterator {}

impl<I, G> Debug for IntersperseWith<I, G> where I: Iterator + Debug, I::Item: Debug, G: Debug {}

impl<I, G> Clone for IntersperseWith<I, G> where I: Iterator + Clone, I::Item: Clone, G: Clone {}

impl<I, G> Iterator for IntersperseWith<I, G> where I: Iterator, G: FnMut() -> I::Item {
    type Item = I::Item;
}

Steps / History

Unresolved Questions

None.

m-ou-se commented 3 years ago

80567 is adding intersperse_with to this feature.

cyqsimon commented 3 years ago

I would like to suggest a simpler, alternative name for this method - weave. At the time of comment, the word weave is not yet used in Rust std.

Intersperse is quite verbose (especially when compared to Vec::join), but more importantly rarely used in common English. weave is much more common and concise in comparison.

ibraheemdev commented 3 years ago

Scala and haskell both use intersperse. Scalaz calls it intercalate.

tmplt commented 3 years ago

I would like to suggest a simpler, alternative name for this method - weave.

weave is not on the list of terms I would search for when looking for the expected behavior. I'd even consider it vanity.

Intersperse is quite verbose (especially when compared to Vec::join), but more importantly rarely used in common English. weave is much more common and concise in comparison.

A verbose function name describes its effect better. More so if the term is uncommon; there is then less ambiguity. intersperse does exactly what is says: intersperse elements with a common separator.

arnabanimesh commented 3 years ago

why not use join and join_with for iterators too?

fosskers commented 3 years ago

Scala and haskell both use intersperse. Scalaz calls it intercalate...

intercalate is slightly different.

intersperse :: a -> [a] -> [a] 

intercalate :: [a] -> [[a]] -> [a] 
redgoldlace commented 3 years ago

why not use join and join_with for iterators too?

I feel like join implies the creation of something monolithic from a sequence and a separator, i.e

let items = vec!["foo", "bar", "blah"];
let result = items.join(" ");

will create the string foo bar blah, and I don't have any issue with that.

But in my mind intersperse better communicates the idea of placing separators in between a sequence, without creating something monolithic using the results of that.

I suppose I mean that if I intersperse an iterator of &strs with " ", it makes sense to me that I get an iterator back. But if I have an iterator of &strs, and the method is named join, I feel like it's a little less clear. Without looking further into it, it might seem like that I'd get a concatenated String back, or similar. For other types it may be less unclear but I'm still unsure on whether calling the method join is a good fit.

arnabanimesh commented 3 years ago

I feel like join implies the creation of something monolithic from a sequence and a separator

That's true, but we are using vec::join or slice::join and that respects the pattern matching and it is not monolithic. Then vec::join should be changed too. If Rust already acknowledges that join be non monolithic, then I suggest we stick with the same ideology.

But if I have an iterator of &strs, and the method is named join, I feel like it's a little less clear. Without looking further into it, it might seem like that I'd get a concatenated String back, or similar.

I'm always collecting to String after intersperse. 😅

My personal preference aside, vec::join creates a vec if the initial data type is not &str. So it is just intersperse which happens to create a String in a specific scenario.

Personally, if it were not as verbose I would have gone for intersperse. If join was not used in vec or slice, I would not root for it. But weave doesn't sound good IMO.

My main point is that let's not have two different terms do the same thing based on datatype.

pickfire commented 3 years ago

Personally, if it were not as verbose I would have gone for intersperse. If join was not used in vec or slice, I would not root for it. But weave doesn't sound good IMO.

I did join for Iterator in the past https://github.com/rust-lang/rust/pull/75738 which accepts the current Join that does both but it wasn't accepted because it requires allocation or Iterator is in std so it couldn't be implemented, splitting it as two functions works but may not be that discoverable.

In a sample code in helix,

        let res = sel
            .ranges
            .into_iter()
            .map(|range| format!("{}/{}", range.anchor, range.head))
            .collect::<Vec<String>>()
            .join(",");

How to join String or &String easily with &str and such? I think cases more than that we need some intervention which the API will not be ergonomic. I think maybe it only works well with &'static str.

In the current example, I believe it's too basic and limited, requires &'static str and copied.

let hello = ["Hello", "World", "!"].iter().copied().intersperse(" ").collect::<String>();

I think we need to think more about the ergonomics of the API before stabilization.

m-ou-se commented 3 years ago

@rfcbot merge

rfcbot commented 3 years ago

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented 3 years ago

:bell: This is now entering its final comment period, as per the review above. :bell:

SoniEx2 commented 3 years ago

could call it "sep" for separator?

rfcbot commented 3 years ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

m-ou-se commented 3 years ago

Re-opening this, as the FusedIterator question is still open.

inquisitivecrystal commented 2 years ago

Is there any reason why we wouldn't want impl<I: FusedIterator> FusedIterator for Intersperse<I>? If no one has anything, I may just open a PR for it. It would be pretty trivial.

inquisitivecrystal commented 2 years ago

I just saw that the stabilization was reverted, per #88967 and #89638.

By the way, could someone update the top post to reflect the stabilization revert?

camelid commented 2 years ago

@inquisitivecrystal I have now added a note about the revert.

bluss commented 2 years ago

I think (just guessing!) it would be realistic to use a build script in itertools, coordinate with stabilizations, and cfg-out methods that collide depending on the compiling rustc version. I mostly wonder if that's even a good enough solution - are there too many things breaking that use a lockfile or won't otherwise pick up a newer version of itertools anyway?

jswrenn commented 2 years ago

I mostly wonder if that's even a good enough solution - are there too many things breaking that use a lockfile or won't otherwise pick up a newer version of itertools anyway?

Only about 38% of crates.io downloads for itertools are for the latest release; the rest are for older versions. Given this, the added maintenance burden of checking the version of rustc might be mostly wasted effort. I also worry about the possibility that there might be subtle breaking changes between how an adapter works in Itertools vs how it works in the version ultimately stabilized in libcore. We'd have to keep a very close eye out for such differences to uphold semver. I'm not confident we'd get it totally right.

An alternative proposed on Zulip: Itertools could prefix every method with it_. I dislike the annoyance-of-upgrading this poses and its aesthetics, but it would mitigate this issue permanently.

The breakage caused by Iterator::intersperse spawned a lot of discussion about language-level solutions to this problem, some of which is relevant to other widely-used crates employing the extension trait pattern (e.g., moving methods from FutureExt to Future). I'd like to avoid making a change like prefixing every Itertools method with it_ if this is just going to be mostly fixed at a language-level anyways.

arnabanimesh commented 2 years ago

Or just use a different name like sperse, intersparse, or something.

jswrenn commented 2 years ago

Or just use a different name like sperse, intersparse, or something.

I really cringe at the thought of playing any part in Rust having a #smooshgate every time someone wants to upstream an itertools adapter into libcore.

arnabanimesh commented 2 years ago

Another option is to integrate itertools in its entirety. Whether someone uses itertools or not becomes irrelevant.

Nytelife26 commented 2 years ago

Or just use a different name like sperse, intersparse, or something.

That isn't viable whatsoever, as sperse simply means to disperse (and is a highly uncommon archaic form to the extent where I wasn't even aware it was a real thing) and intersparse is not standard English.

Another option is to integrate itertools in its entirety. Whether someone uses itertools or not becomes irrelevant.

I'm more in favour of this, depending on the popularity of specific adaptors. In any case, we need a concrete solution for stabilizing things without needing to worry about external crates. My only hope is that Rust's ecosystem doesn't become anything like JavaScript's, where nothing gets done because anything useful is considered breaking.

arnabanimesh commented 2 years ago

@Nytelife26 check the RFC where I mentioned this issue. I have detailed a possible approach. You can contribute there. It has to be done in a systematic manner aka the Rust way. One permanent solution for all similar problems.

cdstanford commented 1 year ago

As a newcomer to this, I'm still pretty confused: Why do we not offer a join function on iterators i.join(",") which is equivalent to i.intersperse(",").collect() ?

This would be a huge ergonomic improvement. The fact that .join() is only a function on slices, and not iterators, is non-idiomatic as the logic of the join function works on an iterator and doesn't require it to be a slice.

pickfire commented 1 year ago

https://github.com/rust-lang/rust/pull/75738 I have created a pull request for that in the past. That is due to join requires allocation, but currently the iterator trait is on core, so we can't add it alongside the other iterators.

Stargateur commented 1 year ago

I would rather say, there is no need to have a shortcut, when iterator trait have already a LOT of method. Learning to combine iterator method is the way to go. i.intersperse(",").collect() is not longer than any other code I know that collect the iterator. We can't add hundred of method on Iterator trait for every use case.

This would be a huge ergonomic improvement. The fact that .join() is only a function on slices, and not iterators, is non-idiomatic as the logic of the join function works on an iterator and doesn't require it to be a slice.

That a bold statement, I don't see how that not idiomatic to NOT have a join method on iterator. Use the very used and versatile collect IS the idiomatic way IMO.

My opinion on the breaking thing is that AFAIK, it was accepted and considered a minor change to add item, this is a 1 years unresolved question that didn't move, I don't think there is any solution, either we accept it or we give up. I don't see why we wait. Seem we wait for https://github.com/rust-lang/rust/issues/89151

cdstanford commented 1 year ago

@pickfire thanks for the answer, but .collect() also allocates, right?

Jules-Bertholet commented 1 year ago

collect() calls a FromIterator impl, which is provided by the collection being collected into. So there is no allocation code in the definition of the Iterator trait

cdstanford commented 1 year ago

Right, so how is join different? Here's the definition for collect:

fn collect<B: FromIterator<Self::Item>>(self) -> B
where
    Self: Sized,
{
    FromIterator::from_iter(self)
}

And here's join:

fn join<B: FromIterator<Self::Item>>(self, separator: Self::Item) -> B
where
    Self: Sized,
    Self::Item: Clone,
{
    FromIterator::from_iter(self.intersperse(separator))
}

This requires stable intersperse, but even without it, join should require no allocation within the Iterator trait itself.

jswrenn commented 1 year ago

Inclusions to the standard library must meet very high bars for ergonomics and performance. Meeting those standards for Iterator::join is going to be tricky.

In many situations, the implementation you propose above is going to be less efficient than writing:

iter.collect::<Vec<_>>.join(sep)

...since that usually does not require cloning the separator. This alternative is also more flexible, as it does not require that the separator type and the item type are the same.

If it were as simple adding that alternative to Iterator:

fn join<S>(self, sep: S) -> <[T] as Join<Separator>>::Output
where
    Self: Sized,
    [Self::Item]: Join<S>,
{
    iter.collect::<Vec<_>>.join(sep)
}

...we might do so! Unfortunately, Iterator is defined in core, and Join is defined in std. We can't simply move Iterator to std (since that would break no_std applications) and we can't simply move Join to core (due to coherence issues).

cdstanford commented 1 year ago

I see what you're saying, but the most common case is Separator = &str, which only clones the &str pointer (not the string itself). I'm not aware of the use cases where iter.collect::<Vec<_>>.join(sep) is more efficient -- on the contrary, I think it's most common that this is less efficient, due to allocation, and users are doing it anyway as it's the easiest way to join a &str in the standard library without using itertools. Would love to see some data though. Thanks for the answer.

Fishrock123 commented 1 year ago

I keep running into places where I would like intersperse rather than .collect::<Vec<_>>().join(). I never run into reasons to use other itertools helpers, by in large.

Which is to say I hope this can be stabilized soon. (I think intersperse is a logically correct thing to call this, since that's exactly what it does.)

Is my understanding correct that this is waiting on https://github.com/rust-lang/rust/issues/89151 which was an accepted RFC but never had implementation work yet?

BogdanDidukh2003 commented 1 year ago

I expected this code to compile

let intersperse = s.chars()
    .enumerate()
    .map(|(index, value)| {
        let lowercase = iter::once(value.to_ascii_uppercase());
        let uppercase = iter::repeat(value.to_ascii_lowercase()).take(index);
        lowercase.chain(uppercase)
    }).intersperse(iter::once('-'))

Instead, I need to create such boilerplate because I need to have exactly the same input type

let IteratorOfCharIterators= s.chars()
    .enumerate()
    .map(|(index, value)| {
        let lowercase = iter::once(value.to_ascii_uppercase());
        let uppercase = iter::repeat(value.to_ascii_lowercase()).take(index);
        lowercase.chain(uppercase)
    }).intersperse(iter::once('-').chain(iter::repeat('0').take(0)));

Or what should I do in similar situations?

c410-f3r commented 9 months ago

What is the status of this feature?

Fishrock123 commented 9 months ago

@c410-f3r as I mentioned above it seems to be blocked on requiring https://github.com/rust-lang/rust/issues/89151

c410-f3r commented 9 months ago

Thank you @Fishrock123

The plan we arrived upon is to first revert the stabilization and then plan on stabilizing the feature in a future release, @Mark-Simulacrum suggested 1.65.

Leaving aside the above initial plan described in https://github.com/rust-lang/rust/issues/88967#issuecomment-938024847 and taking into consideration that https://github.com/rust-lang/rust/issues/89151 appears to be stalled, this feature will likely take some years to be stabilized, if ever.

Fishrock123 commented 9 months ago

Yes, it looks like someone interested would need to take up working on https://github.com/rust-lang/rust/issues/89151

kornelski commented 6 months ago

This has been open for years. The nice solution of changing trait resolution had no progress for years either, and may be blocked by other refactorings too.

Could we reconsider renaming the method?

How about separate_by or insert_between? I think these are simpler names that accurately explain this function.

Or lace, interlace, weave, interweave, intertwine.

lukaslueg commented 6 months ago

My personal bikeshed-report: I'd suggest interweave. Unlike most alternatives, the verb addresses an action done to the iterator (as opposed to an action done to the elements, e.g. seperated_by), and doesn't sound unnecessarily mathematical (e.g. interpolate) or skeuomorphic (e.g. sprinkle, interlace, amalgamate, commingle)

jplatte commented 6 months ago

I think separate(d)_by is a really good name. Unlike intersperse, interlace and interweave it's actually obvious that it takes a separator, not a second iterator like Itertools::interleave.

camelid commented 6 months ago

We could also consider calling it join, like the similar method Vec::join. It's straightforward and matches names in other languages (like Python). The only downside could be confusion with Vec::join, but there are already a couple of existing methods that have the same name between Iterator and Vec (e.g., first and last).

lukaslueg commented 6 months ago

We could also consider calling it join, like the similar method Vec::join. It's straightforward and matches names in other languages (like Python). The only downside could be confusion with Vec::join, but there are already a couple of existing methods that have the same name between Iterator and Vec (e.g., first and last).

.join() is a misnomer on Vec imho, as the term alludes to an action done to the elements ("the elements are joined using the argument"), as opposed to the Vec ("the Vec is joined with a second Vec; which join() doesn't actually do).

If someone can make a decision, I'm willing to change the Stone Of Shame around my neck for the Stone Of Triumph and submit a PR.

camelid commented 5 months ago

I agree that Vec::join has a confusing name... however, I'm unclear if that means you agree that Iterator::join is a good name. It's obviously too late to change Vec::join, so if Iterator::join would match it and be no worse a name, it seems like a potentially good option. Do you mind clarifying?

kornelski commented 5 months ago

It can't be join, because that's already taken by a slightly different operation that also collects. Vec::join("") returns a single joined String. So does Itertools::join.

Both join functions call their argument a separator. Therefore I think Iter::separate(_by) makes the most sense for the iterator operation that happens under the hood of join.

camelid commented 5 months ago

I see, that makes sense. It does seem like Iter::separate or separate_by are best then. Perhaps we do intersperse -> separate and intersperse_with -> separate_with?

senekor commented 5 months ago

My personal favorite would be separate_with and nothing else. The version accepting a closure is more powerful and those couple characters won't hurt anybody. I look at it like this:

.separate_with(|| ",")
.separate(",")

The only purpose of having a separate version seems to me to save a couple (8) characters. Not worth it to clutter the standard library documentation in my opinion. But it doesn't matter that much either. Having both is fine too.

lukaslueg commented 5 months ago

@camelid I was arguing for not using .join(), my bad.

fn separate() -> Separate and fn separate_with() -> SeparateWith it is, then?

workingjubilee commented 5 months ago

I think fn insert_between works best, particularly if we only have the closure form, but I don't mind separate_with. I agree that we only really need one form.