Open BalmungSan opened 3 years ago
Another operation a bit similar would be:
trait IterableOps[A, CC[_], C] {
def groupByTo[MC[_, _], K](factory: Factory[(K, CC[A]), MC[K, CC[A]]])(f: A => K): MC[K, CC[A]]
}
The motivation would be to be able to put the results in a TreeMap
without computing an intermediate Map
, for instance:
users.groupByTo(TreeMap)(_.age)
@julienrf oh that is also a good idea! I think both could be done in the same operation and leaving the normal Map as a default, something like:
def groupByTo[CC[_], MC[_, _], K](factory: Factory[A, CC[A]], mapFactory: MapFactory[MC] = Map)(key: A => K): MC[K, CC[A]]
And I think it is absolutely worth it. I may have a couple of errors in the signature but as soon as #23 is merged I will give it a try to this one (even if it isn't formally accepted by then, unless it was formally rejected).
I do think we should try to add as few ops that allow the most things, even if you have to specify a few more parameters in some cases (e.g. identity
)
Ok, I started to prototype this and I have a small API problem.
I defined the method like this:
def groupMapTo[K, B, C1, MC](key: A => K)(f: A => B)
(colFactory: Factory[B, C1])
(mapFactory: Factory[(K, C1), MC] = immutable.Map): MC = {
Which works great if you use it like this:
data.groupMapTo(_.country)(_.user)(colFactory = SortedSet)(mapFactory = SortedMap
However, this has two "problems"
Map
then I need to add an empty parenthesis at the end ()
, which looks quite ugly.key
and mapping
functions; especially because the mapping function can be quite big.I originally tried to define the function like this:
def groupMapTo[K, B, C1, MC](colFactory: Factory[B, C1], mapFactory: Factory[(K, C1), MC] = immutable.Map)(key: A => K)(f: A => B): MC = {
However, the problem here is type inference. (Scala 3 allows to at least have both factories in the same last group, mitigating problem 1)
Does anyone have any ideas?
What do you all think of an API like this:
trait GroupMap[K, V, CC[_], MC[_, _]] {
def collectValuesAs[CC1[_]](factory: Factory[V, CC1[V]]): GroupMap[K, V, CC1, MC]
def collectResultsAs[MC1[_, _]](factory: Factory[(K, V), MC1[K, V]]): GroupMap[K, V, CC, MC1]
def result: MC[K, CC[V]]
def reduce(reduce: (V, V)): MC[K, V] // Optional.
}
def groupMapTo[K, V](key: A => K)(f: A => V): NextIterableOnceOpsExtensions.GroupMap[K, V, CC, Map] = ???
So it can be used like this:
data
.groupMapTo(_.country)(_.user)
.collectValuesAs(SortedSet)
.collectResultsAs(SortedMap)
.result
This way in the future all the current methods like groupBy
or groupMapReduce
can be implemented in terms of this.
This also opens the possibility of new methods like groupReduce
and getting the results as a SortedMap.
This looks promising!
Similar to #8 the only reason why Iterators (or IterableOnce in general) do not have
groupBy
&groupMap
is because, you can not have aMap[K, Iterator[V]]
without consuming the Iterator as such the values of the Map would be empty.But, since
groupBy
&groupMap
already consume the Iterator I would say nobody would expect the values to be a lazy collection, I would even say that most people would be ok with either List or Seq. However, I would guess that the best would be just giving users the possibility to choose which collection to use to collect the results; this even opens the possibility to a somewhat common requirement of grouping unique values by using a Set.So the proposal is to add the following two methods to IterableOnceOps so they are not only available on Iterator but for all collections.