rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.72k stars 12.5k forks source link

Tracking issue for Vec::extract_if and LinkedList::extract_if #43244

Open Gankra opened 7 years ago

Gankra commented 7 years ago

Feature gate: #![feature(extract_if)] (previously drain_filter)

This is a tracking issue for Vec::extract_if and LinkedList::extract_if, which can be used for random deletes using iterators.

Public API

pub mod alloc {
    pub mod vec {
        impl<T, A: Allocator> Vec<T, A> {
            pub fn extract_if<F>(&mut self, filter: F) -> ExtractIf<'_, T, F, A>
            where
                F: FnMut(&mut T) -> bool,
            {
            }
        }

        #[derive(Debug)]
        pub struct ExtractIf<'a, T, F, #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global>
        where
            F: FnMut(&mut T) -> bool, {}

        impl<T, F, A: Allocator> Iterator for ExtractIf<'_, T, F, A>
        where
            F: FnMut(&mut T) -> bool,
        {
            type Item = T;
            fn next(&mut self) -> Option<T> {}
            fn size_hint(&self) -> (usize, Option<usize>) {}
        }

        impl<T, F, A: Allocator> Drop for ExtractIf<'_, T, F, A>
        where
            F: FnMut(&mut T) -> bool,
        {
            fn drop(&mut self) {}
        }
    }

    pub mod collections {
        pub mod linked_list {
            impl<T> LinkedList<T> {
                pub fn extract_if<F>(&mut self, filter: F) -> ExtractIf<'_, T, F>
                where
                    F: FnMut(&mut T) -> bool,
                {
                }
            }

            pub struct ExtractIf<'a, T: 'a, F: 'a>
            where
                F: FnMut(&mut T) -> bool, {}

            impl<T, F> Iterator for ExtractIf<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                type Item = T;
                fn next(&mut self) -> Option<T> {}
                fn size_hint(&self) -> (usize, Option<usize>) {}
            }

            impl<T, F> Drop for ExtractIf<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                fn drop(&mut self) {}
            }

            impl<T: fmt::Debug, F> fmt::Debug for ExtractIf<'_, T, F>
            where
                F: FnMut(&mut T) -> bool,
            {
                fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {}
            }
        }
    }
}

Steps / History

Unresolved Questions

See https://github.com/rust-lang/rust/issues/43244#issuecomment-641638196 for a more detailed summary of open issues.

RalfJung commented 7 years ago

~~Related issues: https://github.com/rust-lang/rust/issues/25477 https://github.com/rust-lang/rust/pull/34265~~

bluss commented 7 years ago

Maybe this doesn't need to include the kitchen sink, but it could have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

rustonaut commented 7 years ago

Is there any chance this will stabilize in some form in the not to far future?

rustonaut commented 7 years ago

If the compiler is clever enough to eliminate the bounds checks in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way which makes the compiler clever eneugh, in the worst case you could have a "in function specialization", basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

jonhoo commented 6 years ago

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

rustonaut commented 6 years ago

No idea, but drain_where sounds much better and is much more intuitive. Is there still a chance to change it?

bluss commented 6 years ago

.remove_if has been a prior suggestion too

rustonaut commented 6 years ago

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return, and if so shouldn't it be called filtered_drain as the filter get logically used before...

Gankra commented 6 years ago

There is no precedent for using _where or _if anywhere in the standard library.

jonhoo commented 6 years ago

@Gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

crlf0710 commented 6 years ago

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

thegranddesign commented 6 years ago

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually searched for filter, saw drain_filter and kept searching without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

thegranddesign commented 6 years ago

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

rpjohnst commented 6 years ago

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

thegranddesign commented 6 years ago

@rpjohnst so I searched here, am I missing filter somewhere?

rpjohnst commented 6 years ago

Yes, it's a member of Iterator, not Vec.

Gankra commented 6 years ago

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

polarathene commented 6 years ago

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

jonhoo commented 6 years ago

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

tmccombs commented 6 years ago

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

Gankra commented 6 years ago

Yes

Emerentius commented 6 years ago

Why exactly does dropping the iterator cause it to run through to the end? I think that's surprising behaviour for an iterator and it could also be, if desired, done explicitly. The inverse of taking only as many elements out as you need is impossible on the other hand because mem::forgeting the iterator runs into leak amplification.

Boscop commented 6 years ago

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut(). On an intuitive logical level, it would make more sense to return true for entries I want to keep, does anyone else feel this way? (Especially considering that retain() already works this way) Why not do that, and rename drain_filter() to retain_iter() or retain_drain() (or drain_retain())? Then it would also mirror retain() more closely!

rustonaut commented 6 years ago

This is why I proposed to rename it to drain_where as it then is clear that:

  1. It is a form of drain so we use drain in the name

  2. By using where in combination with drain it is clear that the elements where the predicate is true get drained, i.e removed and returned

  3. It is more in sync with other names in std, normally if you have a function consisting of two predicates you can emulate it (roughly) by using functions representing each of the predicates in a "and then" fashion, e.g. filter_map can be emulated (roughly) as filter an then map. The current name indicates it is drain and then filter, but it isn't even close to it as it does not do a complete drain at all.

On Sun, Feb 25, 2018, 17:04 Boscop notifications@github.com wrote:

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain_mut(). On a primal level, it would make more sense to return true for entries I want to keep, does anyone else feel this way? Why not do that, and rename drain_filter() to retain_filter()?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/43244#issuecomment-368320990, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR0kfwaNvz8YBwCE4BxDkeHgGxLvcWxks5tYYRxgaJpZM4OY1me .

Boscop commented 6 years ago

But with drain_where() the closure would still have to return true for elements that should be removed, which is the opposite of retain() which makes it inconsistent.. Maybe retain_where? But I think you're right that it makes sense to have "drain" in the name, so I think drain_retain() makes the most sense: It's like drain() but retaining the elements where the closure returns true.

rustonaut commented 6 years ago

While it does not change, that you have to return true it makes clear that you have to do so.

I personally would use either:

a. drain_where b. retain_where c. retain

I would not use drain_retain. Drain and retain speak about the same kind of process but from opposite perspectives, drain speaks about what you remove (and r return) retain speaks about what you keep (in the way it is already used in std).

Currently retaim is already implemented in std with the major difference that it is discarding elements while drain is returning them, which makes retain (sadly) unsuited to be use in the name (except if you want to call it retain_and_return or similar).

Another point which speaks for drain is ease of migration, i.e. migrating to drain_where is as easy as running a word based search and replace on the code, while changing it to retain would need a additional negation of all used predicates/filter functions.

On Sun, Feb 25, 2018, 18:01 Boscop notifications@github.com wrote:

But with drain_where() the closure would still have to return true for elements that should be removed, which is the opposite of retain() which makes it inconsistent.. Maybe retain_where? But I think you're right that it makes sense to have "drain" in the name, so I think drain_retain() makes the most sense: It's like drain() but retaining the elements where the closure returns true.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/43244#issuecomment-368325374, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR0kfG4oZHxGfpOSK8DjXW3_2O1Eo3Rks5tYZHxgaJpZM4OY1me .

Boscop commented 6 years ago

But how often would you migrate from drain() to drain_filter()? In all cases until now, I migrated from retain() to drain_filter() because there is no retain_mut() in std and I need to mutate the element! So then I had to invert the closure return value.. I think drain_retain() makes sense because the drain() method drains unconditionally all elements in the range, whereas drain_retain() retains the elements where the closure returns true, it combines the effects of the drain() and retain() methods.

rustonaut commented 6 years ago

Sorry I mean to migrate from drain_filter to drain_where.

That you have a solution using retain and then need to use drain_filter is a aspects I have not yet considered.

On Sun, Feb 25, 2018, 19:12 Boscop notifications@github.com wrote:

But why would you migrate from drain() to drain_filter()? In all cases until now, I migrated from retain() to drain_filter() because there is no retain_mut() in std and I need to mutate the element! I think drain_retain() makes sense because the drain() method drains unconditionally all elements in the range, whereas drain_retain() retains the elements where the closure returns true.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/43244#issuecomment-368330896, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR0kSayIk_fbp5M0RsZW5pYs3hDICQIks5tYaJ0gaJpZM4OY1me .

Boscop commented 6 years ago

Ah yes, but I think the "price" of inverting the closures in current code that uses drain_filter() is worth it, to get a consistent and intuitive API in std and then in stable. It's only a small fixed cost (and eased by the fact that it would go along with a renaming of the function, so the compiler error could tell the user that the closure has to be inverted, so it wouldn't silently introduce a bug), compared to the cost of standardizing drain_filter() and then people always having to invert the closure when migrating from retain() to drain_filter().. (on top of the mental cost of remembering to do that, and the costs of making it harder to find it in the docs, coming from retain() and searching for "something like retain() but passing &mut to its closure, which is why I think it makes sense that the new name of this function has "retain" in the name, so that people find it when searching in the docs).

Some anecdotal data: In my code, I always only needed the retain_mut() aspect of drain_filter() (often they were using retain() before), I never had use cases where I needed to process the drained values. I think this will also the most common use case for others in the future (since retain() doesn't pass &mut to its closure so that drain_filter() has to cover that use case, too, and it's a more common use case than needing to process the drained values).

rustonaut commented 6 years ago

The reason why I'm agains drain_retain is because of the way names are currently used in std wrt. collections:

  1. you have function names using predicates which have producing/consuming concepts associated with them (wrt. rust, iterations). For example drain, collect, fold, all, take, ...
  2. this predicates have sometimes modifiers e.g. *_where, *_while
  3. you have function names using predicates which have modifying properties (map, filter, skip, ...)
    • here it's vague if it is element or iteration modifying (map vs. filter/skip)
  4. function names chaining multiple predicates using modifying properties e.g. filter_map
    • having a concept of roughly apply modifier_1 and then apply modifier_2, just that it's faster or more flexible doing this in one step

You sometimes might have:

  1. function names combining producing/consuming predicates with modifying ones (e.g. drain_filter)
    • but most times it is better/less confusing to combine them with modifiers (e.g. drain_where)

You normally do not have:

  1. two of the produceing/consuming predicates combined into one name, i.e. we do not have thinks like take_collect as it is easily confusing

drain_retain does kinda make sense but falls in the last category, while you probably can guess what it does it basically says remove and return all elements "somehow specified" and then keep all elements "somehow specified" discarding other elements.


One the other hand I don't know why there should not be a retain_mut maybe opening a quick RFC introducing retain_mut as a efficient way to combine modify + retain I have a hunch it might be faster stabilized then this function. Until then you could consider writing a extension trait providing you own retain_mut using iter_mut + a bool-array (or bitarray, or...) to keep track of which elements have to be reomved. Or providing your own drain_retain which internally uses drain_filer/drain_where but wraps the predicate into a not |ele| !predicate(ele).

Boscop commented 6 years ago

@dathinab

  1. We are talking about a method on collections here, not on Iterator. map, filter, filter_map, skip, take_while etc are all methods on Iterator. Btw, which methods do you mean that use *_where? So we have to compare the naming scheme to methods that already exist on collections, e.g. retain(), drain(). There is no confusion with Iterator methods which transform one iterator into another iterator.
  2. AFAIK the consensus was that retain_mut() would not be added to std because drain_filter() will already be added and people were advised to use that. Which brings us back to the use case of migrating from retain() to drain_filter() being very common, so it should have a similar name and API (closure returning true means keep the entry)..
rustonaut commented 6 years ago

I haven't been aware that retain_mut was already discussed.

We are talking about general naming schemes in std mainly wrt. to collections, which does include Iterators as they are one of the main access methods for collections in rust, especially when it's about modifying more than one entry.

The reason why drain_retain is confusing is that it's not clear if it is drain or retain based, if it is drain based, returning true would remove the value, if is retain based it would keep it. Using _where makes it at last clear what is expected of the passed in function.

On Mon, Feb 26, 2018, 00:25 Boscop notifications@github.com wrote:

@dathinab https://github.com/dathinab

  1. We are talking about a method on collections here, not on Iterator. map, filter, filter_map, skip, take_while etc are all methods on Iterator. Btw, which methods do you mean that use *_where? So we have to compare the naming scheme to methods that already exist on collections, e.g. retain(), drain(). There is no confusion with Iterator methods which transform the iterator into another iterator.
  2. AFAIK the consensus was that retain_mut() would not be added to std because drain_filter() will already be added and people were advised to use that. Which brings us back to the use case of migrating from retain() to drain_filter() being very common, so it should have a similar name and API (closure returning true means keep the entry)..

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/43244#issuecomment-368355110, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR0kfkRAZ5OtLFZ-SciAmjHDEXdgp-0ks5tYevegaJpZM4OY1me .

RalfJung commented 6 years ago

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().

FWIW, I think retain is the counter-intuitive name here. I usually find myself wanting to delete certain elements from a vector, and with retain I always have to invert that logic.

Boscop commented 6 years ago

But retain() is already in stable, so we have to live with it.. And so my intuition got used to that..

rustonaut commented 6 years ago

@Boscop: and so is drain which is the inverse of retain but also returns the removed elements and the usage of suffixes like _until,_while for making functions available which are just a slightly modified version of a existing functionality.

I mean if I would describe drain it would be something like:

remove and return all elements specified "in some way", keep all other elements where "in some way" is "by slicing" for all sliceable collection types and "all" for the rest.

The description for the function discussed here is the same except that "in some way" is "where a given predicate returns true".

One the other hand the description I would give retain is: only retain (i.e. keep) elements where a given predicate returns true, discard the rest

(yes, retain could have been used in a way where it does not discard the rest, sadly it wasn't)


I do think that it would have been really nice if retain would have passed &mut T to the predicate and maybe returned the removed values. Because I think retain is a more intuitive name base.

But independent of this I also think that both drain_filter/drain_retain are suboptimal as they do not make it clear if the predicate has to return true/false to keep/drain a entry. (drain indicates true does remove it as it speaks about removing elements while filter and retrain speaks about which elements to keep, at last in rust)


In the end it's not that important which of the names is used, it would be just nice if it gets stabilized.

Doing a poll and/or letting someone from the language team decide might be the best way to move thinks forward?

tmccombs commented 6 years ago

I think something like drain_where, drain_if, or drain_when, is much clearer than drain_filter.

Boscop commented 6 years ago

@tmccombs Out of those 3, I think drain_where makes the most sense. (Because if implies either do the whole thing (in this case draining) or not and when is temporal.) Compared to drain_filter the closure return value is the same with drain_where (true to remove an element) but that fact is made clearer/explicit by the name, so it eliminates the risk of accidentally interpreting the meaning of the closure return value wrongly.

SimonSapin commented 6 years ago

I think it’s more than time to stabilize. Summary of this thread:

@Gankro, what do you think?

SimonSapin commented 6 years ago

The libs team discussed this and the consensus was to not stabilize more drain-like methods at the moment. (The existing drain_filter method can stay in Nightly as unstable.) https://github.com/rust-lang/rfcs/pull/2369 proposes adding another drain-like iterator that doesn’t do anything when dropped (as opposed to consuming the iterator to its end).

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

Possibilities might include "overloading" a method by making it generic, or a builder pattern.

One constraint is that the drain method is stable. It can possibly be generalized, but only in backward-compatible ways.

shepmaster commented 6 years ago

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

How and where does the team foresee this type of experimentation happening?

SimonSapin commented 6 years ago

How: come up with and propose a concrete API design, possibly with a proof-of-concept implementation (which can be done out of tree through at least Vec::as_mut_ptr and Vec::set_len). Where doesn’t matter too much. Could be a new RFC or a thread in https://internals.rust-lang.org/c/libs, and link it from here.

Emerentius commented 6 years ago

I've been playing around with this for a bit. I'll open a thread on internals in the next days.

Boscop commented 6 years ago

I think a general API that works like this makes sense:

    v.drain(a..b).where(pred)

So it's a builder-style API: If .where(pred) is not appended, it will drain the whole range unconditionally. This covers the capabilities of the current .drain(a..b) method as well as .drain_filter(pred).

If the name drain can't be used because it's already in use, it should be a similar name like drain_iter.

The where method shouldn't be named *_filter to avoid confusion with filtering the resulting iterator, especially when where and filter are used in combination like this:

    v.drain(..).where(pred1).filter(pred2)

Here, it will use pred1 to decide what will be drained (and passed on in the iterator) and pred2 is used to filter the resulting iterator. Any elements that pred1 returns true for but pred2 returns false for will still get drained from v but won't get yielded by this combined iterator.

What do you think about this kind of builder-style API approach?

Boscop commented 6 years ago

For a second I forgot that where can't be used as function name because it's already a keyword :/

And drain is already stabilized so the name can't be used either..

Then I think the second best overall option is to keep the current drain and rename drain_filter to drain_where, to avoid the confusion with .drain(..).filter().

(As jonhoo said above: )

what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

Emerentius commented 6 years ago

I've opened a thread on internals. The TLDR is that I think that non-selfexhaustion is a bigger can of worms than expected in the general case and that we should stabilize drain_filter sooner rather than later with a RangeBounds parameter. Unless someone has a good idea for solving the issues outlined there.

Edit: I've uploaded my experimental code: drain experiments There are also drain and clearing benches and some tests but don't expect clean code.

Popog commented 6 years ago

Totally missed out on this thread. I've had an old impl that I've fixed up a bit and copy pasted to reflect a few of the options described in this thread. The one nice thing about the impl that I think will be non-controversial is that it implements DoubleEndedIterator. View it here.

Boscop commented 6 years ago

@Emerentius but then we should at least rename drain_filter to drain_where, to indicate that the closure has to return true to remove the element!

Emerentius commented 6 years ago

@Boscop Both imply the same 'polarity' of true => yield. I personally don't care whether it's called drain_filter or drain_where.

@Popog Can you summarize the differences and pros & cons? Ideally over at the internals thread. I think DoubleEndedIterator functionality could be added backwards compatibly with zero or low overhead (but I haven't tested that).

askeksa commented 6 years ago

How about drain_or_retain? It's a grammatically meaningful action, and it signals that it does one or the other.

Boscop commented 6 years ago

@askeksa But that doesn't make it clear whether returning true from the closure means "drain" or "retain". I think with a name like drain_where, it's very clear that returning true drains it, and it should be clear to everyone that the elements that aren't drained are retained.