rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.64k stars 299 forks source link

More generic maps (in `GroupingMap`) #901

Open Philippe-Cholet opened 4 months ago

Philippe-Cholet commented 4 months ago

At the moment, this is only a proof of concept, but it would close #588 (and eventually solve some issues, see https://github.com/rust-itertools/itertools/labels/generic-container).

The last point is where I struggled the most but that's pretty much the only real possibility because there is so many possibilities of initializations: BTreeMap::new(), HashMap::new(), HashMap::default() for some hashers, and with_capacity shenanigans. And I think it's the most general solution (I can even imagine one single HashMap being used multiple times to avoid reallocations).

This could extend to other areas than just GroupingMap, and a Set trait could be wrote to generalize BTreeSet and HashSet but it's for another PR.

For now, I would like some feedback.

CI currently fails because of temporary unused imports.

Philippe-Cholet commented 4 months ago

Currently, we can do it.into_grouping_map().product(); but we can't have a generic map.

let my_map = HashMap::new();
let my_map = HashMap::with_capacity(100);
let my_map = FxHashMap::default(); // rustc_hash
let mut my_map = FxHashMap::default(); my_map.reserve(100);
let my_map = BTreeMap::new();

But where to give the map? (without breaking change)

1. Initial idea: suffix all GroupingMap methods with _in. But it results in many weird names like minmax_by_key_in.

it.into_grouping_map().product_in(my_map);
//                            --- ------

For each GroupingMap method like product, old it.into_grouping_map().product(); would delegate to new it.into_grouping_map().product_in(HashMap::new());.

2. Suffix Itertools methods. The trait would be a bit more crowded. But only two redirections.

it.into_grouping_map_in(my_map).product();
//                  --- ------

Old it.into_grouping_map(); would delegate to new it.into_grouping_map_in(HashMap::new());. Similar for into_grouping_map_by -> into_grouping_map_by_in but that's it.

That would be similar to other possible extensions like Itertools::counts_in and Itertools::all_unique_in.

3. A bit nested. But no _in suffix.

it.into_grouping_map().with_map(my_map).product();
//                    -----------------

For each GroupingMap method like product, old it.into_grouping_map().product(); would delegate to new it.into_grouping_map().with_map(HashMap::new()).product();.

phimuemue commented 4 months ago

Hi @Philippe-Cholet, two thoughts:

Philippe-Cholet commented 4 months ago

@phimuemue I see your performance point about Vec<(K, V)>. It can always be added later (in the next PR, or any release) as we would not break anything by implementing Map for a new type.

Well, silly me...!! I don't know how it could work in a central place (cases 2 and 3 in my last comment). We would create a new struct GroupingMapIn that GroupingMap would alias, right? pub type GroupingMap<I> = GroupingMapIn<I, HashMap<_, _>>;... Simply blocked! The first one would be <I as Iterator>::Item.0 (which I can't write) and the second is still unknown since it's the method using it that will determine it... Then only remain my initial idea (case 1) method_name_in<..., M: Map<Key = ..., Value = ...>>(..., map: M) -> M.

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 95.34368% with 21 lines in your changes are missing coverage. Please review.

Project coverage is 94.43%. Comparing base (6814180) to head (5a71e63). Report is 40 commits behind head on master.

Files Patch % Lines
src/generic_containers.rs 54.54% 20 Missing :warning:
src/grouping_map.rs 99.75% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #901 +/- ## ========================================== + Coverage 94.38% 94.43% +0.04% ========================================== Files 48 49 +1 Lines 6665 7187 +522 ========================================== + Hits 6291 6787 +496 - Misses 374 400 +26 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Philippe-Cholet commented 4 months ago

@SkiFire13 I see you are the author of GroupingMap. What do you think of such extension?

Philippe-Cholet commented 4 months ago

@SkiFire13

The duplication of the methods is a bit unfortunate though.

I totally agree, I'm gonna investigate your "add two generics" idea.

Could this be extended to into_group_map[_by] too?

Yes, in the next PR. I intend to extend all HashMap usage in the crate (and HashSet later), most https://github.com/rust-itertools/itertools/labels/generic-container issues should be solved with this Map.

Because I want to make all this work first, I'm gonna delay two of your ideas:

SkiFire13 commented 4 months ago

Because I want to make all this work first, I'm gonna delay two of your ideas:

* Optimize `remove+insert` in `GroupingMap::aggregate_in`.

* `impl<M: Map> Map for &mut M`.

Understandable. As long as Map is private this can be changed later, so I'm fine with it.