Open Philippe-Cholet opened 8 months ago
Currently, we can do it.into_grouping_map().product();
but we can't have a generic map.
let my_map = HashMap::new();
let my_map = HashMap::with_capacity(100);
let my_map = FxHashMap::default(); // rustc_hash
let mut my_map = FxHashMap::default(); my_map.reserve(100);
let my_map = BTreeMap::new();
1. Initial idea: suffix all GroupingMap
methods with _in
. But it results in many weird names like minmax_by_key_in
.
it.into_grouping_map().product_in(my_map);
// --- ------
For each GroupingMap
method like product
, old it.into_grouping_map().product();
would delegate to new it.into_grouping_map().product_in(HashMap::new());
.
2. Suffix Itertools
methods. The trait would be a bit more crowded. But only two redirections.
it.into_grouping_map_in(my_map).product();
// --- ------
Old it.into_grouping_map();
would delegate to new it.into_grouping_map_in(HashMap::new());
.
Similar for into_grouping_map_by
-> into_grouping_map_by_in
but that's it.
That would be similar to other possible extensions like Itertools::counts_in
and Itertools::all_unique_in
.
3. A bit nested. But no _in
suffix.
it.into_grouping_map().with_map(my_map).product();
// -----------------
For each GroupingMap
method like product
, old it.into_grouping_map().product();
would delegate to new it.into_grouping_map().with_map(HashMap::new()).product();
.
Hi @Philippe-Cholet, two thoughts:
into_grouping_map
play nicely with type inference? (Currently, GroupingMap
returns maps with different value types, e.g. minmax
returns MinMaxResult<V>
s, but max
returns V
s.) If so, I would go with it (offers a central place to supply a map and generalizes well to counts
et al).Vec<(K, V)>
can be more efficient than {BTree|Hash}Map
. Thus, I think we should ensure that trait Map
could support this type.@phimuemue
I see your performance point about Vec<(K, V)>
. It can always be added later (in the next PR, or any release) as we would not break anything by implementing Map
for a new type.
Well, silly me...!! I don't know how it could work in a central place (cases 2 and 3 in my last comment).
We would create a new struct GroupingMapIn
that GroupingMap
would alias, right?
pub type GroupingMap<I> = GroupingMapIn<I, HashMap<_, _>>;
... Simply blocked! The first one would be <I as Iterator>::Item.0
(which I can't write) and the second is still unknown since it's the method using it that will determine it...
Then only remain my initial idea (case 1) method_name_in<..., M: Map<Key = ..., Value = ...>>(..., map: M) -> M
.
Attention: Patch coverage is 95.34368%
with 21 lines
in your changes are missing coverage. Please review.
Project coverage is 94.43%. Comparing base (
6814180
) to head (5a71e63
). Report is 40 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
src/generic_containers.rs | 54.54% | 20 Missing :warning: |
src/grouping_map.rs | 99.75% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@SkiFire13 I see you are the author of GroupingMap
. What do you think of such extension?
@SkiFire13
The duplication of the methods is a bit unfortunate though.
I totally agree, I'm gonna investigate your "add two generics" idea.
Could this be extended to
into_group_map[_by]
too?
Yes, in the next PR. I intend to extend all HashMap
usage in the crate (and HashSet
later), most https://github.com/rust-itertools/itertools/labels/generic-container issues should be solved with this Map
.
Because I want to make all this work first, I'm gonna delay two of your ideas:
remove+insert
in GroupingMap::aggregate_in
.impl<M: Map> Map for &mut M
.Because I want to make all this work first, I'm gonna delay two of your ideas:
* Optimize `remove+insert` in `GroupingMap::aggregate_in`. * `impl<M: Map> Map for &mut M`.
Understandable. As long as Map
is private this can be changed later, so I'm fine with it.
At the moment, this is only a proof of concept, but it would close #588 (and eventually solve some issues, see https://github.com/rust-itertools/itertools/labels/generic-container).
BTreeMap
andHashMap
(with any hasher). Maybe too much methods, or not enough. I will adjust it. I don't think we should generalize to more than those two maps but I guess it's possible.use_std
touse_alloc
where possible (becauseBTreeMap
only needsuse_alloc
whileHashMap
needsuse_std
) and relaxed someEq + Hash
bounds.aggregate_in
toaggregate
which takes one more argument: any empty map (or one we would.clear()
).The last point is where I struggled the most but that's pretty much the only real possibility because there is so many possibilities of initializations:
BTreeMap::new()
,HashMap::new()
,HashMap::default()
for some hashers, andwith_capacity
shenanigans. And I think it's the most general solution (I can even imagine one singleHashMap
being used multiple times to avoid reallocations).This could extend to other areas than just
GroupingMap
, and aSet
trait could be wrote to generalizeBTreeSet
andHashSet
but it's for another PR.For now, I would like some feedback.
CI currently fails because of temporary unused imports.