Closed Philippe-Cholet closed 1 year ago
I saw that PR some weeks or months ago but forgot about it, I only remember the crate name and that I thought it seemed quite complex. My approach with trait VecItems
is simple but quite efficient and flexible enough to make an alternative iterator for any iterator generating vectors like I did here for all 5 iterators.
@phimuemue After diving in a bit, I agree that the problem lies with "Iterator" itself and that a "lending iterator" would be a better choice but I share the opinion that it should not be a maintenance burden to have an (evolving) additional dependency and that it would be preferable to wait it reaches the "core" if it is even a possibility (I don't know).
I like my workaround but I guess I should close this issue?
@Philippe-Cholet I don't think you should close it.
@phimuemue
This sounds similar to https://github.com/rust-lang/rust/pull/94667 and https://github.com/rust-lang/rust/pull/82413. This PR makes sense since it appears to be how this kind of thing is being handled in std
. I still think the lending solution is better but it doesn't look like that will happen any time soon.
@Easyoakland Thanks for the link, I did not know about map_windows
.
Still, I am unsure that going this way is the best solution:
As it stands, MapSlice
populates the whole vector, even if many of the elements would stay in-place.
Possible (maybe more efficient) alternatives:
indices
.We may be doing duplicate work in the first place: Imagine that someone wants to generate permutations
of [0, 1, 2, 3, 4, 5]
. The implementation internally generates indices
and the resulting Item
-- even though the indices
themselves would be sufficient in this particular case.
For maximum performance, there might be more efficient algorithms (such as Heap's for permutations (which is in permutohedron
)).
To sum up: The proposed solution improves performance, but does not offer maximum convenience (due to complicating the API), and it is unclear if the performance could be even better (my guess is "it could be better with reasonable amount of work, depending on your use case").
As such, it "falls in the middle" (neither most convenient nor most performant). If that's impossible (or hard) to offer both convenience and performance, then the API should imho not sacrifice at both ends. Thus, I am reluctant to merge this, and would resort to using other, more efficient, algorithms if needed.
@Philippe-Cholet Do you happen to have a comparison for your permutations_map
vs. your patched permutohedron::Heap_mapped
? If Heap_mapped
can be significantly faster than permutations_map
, then I'd interpret this as a sign that having and maintaining permutations_map
is questionable.
@phimuemue I agree it's in the middle as you say it. I think it remains convenient enough (and the old methods remain there unchanged) for a nice speed up but it definitely is not perfect (clone + populate whole vector + clear...), mainly because I did not have to change the internals too much. This workaround was generic enough to modify all 5 five iterators quite easily. Without it, I guess I would have to discard the lazy buffer for something else, and do different things for each iterator.
I did not compare permutations_map
with "permutohedron::Heap_mapped
" but the permutations are generated in different orders. Heap's algo is designed to be the fastest while permutations[_map]
create permutations in the natural order. Compare them did not seem fair. But it might tell us something.
... Well it does tell us there is real room for improvement for permutations_map
:
Time | Permutations of (0..10).collect_vec() |
Comment |
---|---|---|
235ms | permutohedron::Heap::new(&mut vec).map(\|_\| 0).last() |
|
324ms | permutations(vec.into_iter(), 10).map(\|_\| 0).last() |
|
8.08ms | heap::permutations_map(&mut vec, \|_\| 0).last() |
permutohedron::Heap wrapped by me |
87.6ms | permutations_map(vec.into_iter(), 10, \|_\| 0).last() |
MapSlice way |
The first two lines tell us how much Heap's algo is faster than getting the natural order. Both allocates each item to a vector, while the last two do not.
I guess we could have expected permutations_map
to be roughly 8 times faster. I did not thought it would be that bad.
Well I'm closing this. And I'll reopen only if I try and really fasten it.
EDIT: It was a nice exercise and I learned that it has to be really fast if it's not the most convenient, no compromise.
@Philippe-Cholet Wow, thanks a lot for you effort.
I agree with all your conclusions.
For permutations (and 4 others), we could discard my Itertools::permutations_map<R,F>
and add a method map_slices<R,F>
that would convert Permutations<I>
to PermutationsMap<I,F>
to do the same job as before.
It's just a nicer API in my opinion, exact same performance (not great yet but that might eventually change).
Name it map
would be a nasty breaking change because it would override Iterator::map
and has a quite different signature.
Problem: slow because of repetitive heap-allocation
The 5 iterators below generate
Vec<...>
items.It can be slow, because of the repetitive allocations, even if the user does not need to permanently store each permutation.
I think that in most cases, the user uses each vector to produce something else and then each vector is discarded. In that case, because those iterators generate (up to) many many items (product of the lengths, factorial-related, powers of two), it creates many temporary vectors which can slow down code execution.
Proposed solution: map slices
An alternative iterator could own one single (internal/private) vector and only give a reference to it to a function implementing
FnMut(&[...]) -> R
. Those iterators would then generateR
items instead of vectors.Usage examples
it.combinations(k)
andit.combinations_map(k, ToOwned::to_owned)
would produce the same elements, but the first should be used!it.combinations(k).map(closure)
andit.combinations_map(k, closure)
could produce the same elements if it does not require a vector. For what I tested, it's then roughly 4 to 5 times faster to go through all combinations, unless the iterator has very few elements (then.map
should be used).In
it.permutations(5).filter_map(|perm: Vec<_>| some_option)
, ifsome_option
does not need perm to be an owned vector then it could becomeit.permutations_map(5, |perm: &[_]| some_option).flatten()
where there is no heap-allocation for each permutation.Implementation (commits here)
New module
vec_items
(behinduse_alloc
feature) with:pub trait VecItems<T> { type Output; fn new_item<I: Iterator<Item = T>>(&mut self, iter: I) -> Self::Output; }
pub struct CollectToVec;
implementingVecItems
withVec<T>
output and justiter.collect()
(so a new vector each time, as it's currently done).pub struct MapSlice<F, T> { func: F, vec: Vec<T> }
implementingVecItems
withR
output whereF: FnMut(&[T]) -> R
and the internal vector is cleared but not dropped after use (and its length will not change often if ever).E.g. for
combinations_map
:Itertools
trait:#[cfg(feature = "use_alloc")] fn combinations_map<R, F: FnMut(&[Self::Item]) -> R>(self, k: usize, f: F) -> CombinationsMap<Self, F> where ...
src/combinations.rs
:Combinations<I>
toCombinationsBase<I, F>
with the new fieldmanager: F
F: VecItems<I::Item>
and item typeF::Output
pub type Combinations<I> = CombinationsBase<I, CollectToVec>;
pub type CombinationsMap<I, F> = CombinationsBase<I, MapSlice<F, <I as Iterator>::Item>>;
fn combinations_map<I, F>(iter: I, k: usize, f: F) -> CombinationsMap<I, F>
combinations
.Very similar for the 4 others.
powerset[_map]
is a bit simpler because it just usescombinations[_map]
internally ; and it was simpler formulti_cartesian_product
than anticipated.Closing thoughts
Avoid cloning in
MapSlice
case? I think(?) it might be possible to only have references:MapSlice { func: F, vec: Vec<&T> }
. ThenF: FnMut(&[&T]) -> R
and no cloning. But I struggle with the lifetimes (or more probably some hard barrier). But is it worth it? Probably not if it's just integers, but otherwise?The documentation I did not wrote yet would suggest to use the non
_map
variants if the user needs owned vectors and not slices (or if there is very little elements expected).PS: I got the idea while replacing
(0..nb).permutations(nb)
(after usingflamegraph
andheaptrack
) by a_map
version of mine ofpermutohedron::Heap
to avoid any (internal) heap-allocation while keeping a nice iterator structure. And for8! == 40320
times, that was also 5 times faster.