rust-lang / libs-team

The home of the library team
Apache License 2.0
115 stars 18 forks source link

`swap_retain` on `Vec` #275

Closed vringar closed 10 months ago

vringar commented 11 months ago

Proposal

Problem statement

Efficiently removing elements of a vec without preserving order is currently not supported by the standard library.

single element multi element
Preserves ordering remove retain
Does not preserve ordering swap_remove Does not exist

Motivating examples or use cases

While modeling a dense graph using Vec<Vec<usize>> (so called adjacency list) where each Vec<usize> represents adjacent nodes, during the removal of nodes, I needed a fast way to remove adjancency list entries (typically very few compared to the total size of the graph) without preserving order in the adjacency list.

Solution sketch

A docs level description of the feature could look like this (modified from Vec::retain) :

Retains only the elements specified by the predicate without preserving the original order.

In other words, remove all elements e for which f(&e) returns false. This method operates in place, visiting each element exactly once in an unspecified order and does not preserves the order of the retained elements.

Examples

let mut vec = vec![1, 2, 3, 4];
vec.swap_retain(|&x| x % 2 == 0);
assert!(vec.contains(2));
assert!(vec.contains(4));

Because the elements are visited exactly once but not in the original order, external state may be of limited use when deciding which elements to keep.

As for the method name, the two proposals that came up on Zulip were swap_retain and retain_unstable. The former creates a nice symmetry in the table up top, but promises more than we might be willing to guarantee. retain_unstable uses the word unstable in the way that sorting algorithms use it, but unstable has a semantic overload in Rust which might make it undesirable. @pitaj has pointed out that the insertion methods in this space (iterator vs single element, in place vs at the end) are named insert, splice, push, extend, so there is existing precedent for not having a consistent naming scheme.

Alternatives

This playground link contains two implementations of a potential swap_retain method. (I originally named them swap_retain_mut due to retain having two different methods, but @cuviper pointed out that this is a historical accident. This ACP only proposes a single method.)

The second implementation was done by @quaternic and already performs a lot better than my previous attempts, but it still might move a single element multiple times, so there is still some performance to be gained.

I'm unaware of a crate the extends the standard containers with additional methods, but if there is such a crate it might make sense to move the implementation there.

Links and related work

I created this Zulip stream before the ACP

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

scottmcm commented 11 months ago

Fixed zulip URL: https://rust-lang.zulipchat.com/#narrow/stream/327149-t-libs-api.2Fapi-changes/topic/.E2.9C.94.20swap_retain.20for.20Vec/near/394377681

vringar commented 11 months ago

Fixed in original post

CryZe commented 11 months ago

How does this interact with extract_if (do we need swap_extract_if)?

m-ou-se commented 10 months ago

Motivating examples or use cases

While modeling a dense graph using Vec<Vec<usize>> (so called adjacency list) where each Vec<usize> represents adjacent nodes, during the removal of nodes, I needed a fast way to remove adjancency list entries (typically very few compared to the total size of the graph) without preserving order in the adjacency list.

Would calling swap_remove in a loop not work for your use case? Would swap_retain be more efficient than calling swap_remove in a loop?

cuviper commented 10 months ago

In the zulip thread, we came around to an algorithm similar to Iterator::partition_in_place that should minimize data movement -- and in this case doesn't need to swap at all, just move retained items at most once. There's nothing about that which couldn't be written in external code, but it's probably not an "obvious" algorithm, which gives it some merit for a standard library implementation IMO.

Amanieu commented 10 months ago

We discussed this in the libs-api meeting. We're generally happy to add this: although it's possible to implement this manually with a loop, that's 7 lines of code you now don't need to write by hand. There was a preference for the name retain_unstable which clearly indicates that the order of elements is not preserved ("swapping" is an implementation detail).