Add groupByTo and groupMapTo to IterableOnceOps

BalmungSan commented 3 years ago

Similar to #8 the only reason why Iterators (or IterableOnce in general) do not have groupBy & groupMap is because, you can not have a Map[K, Iterator[V]] without consuming the Iterator as such the values of the Map would be empty.

But, since groupBy & groupMap already consume the Iterator I would say nobody would expect the values to be a lazy collection, I would even say that most people would be ok with either List or Seq. However, I would guess that the best would be just giving users the possibility to choose which collection to use to collect the results; this even opens the possibility to a somewhat common requirement of grouping unique values by using a Set.

So the proposal is to add the following two methods to IterableOnceOps so they are not only available on Iterator but for all collections.

def groupByTo[C, K](factory: Factory[A, C])(key: A => K): Map[K, C]
def groupMapTo[C, K, B](factory: Factory[B, C])(key: A => K)(f: A => B): Map[K, C]

julienrf commented 3 years ago

Another operation a bit similar would be:

trait IterableOps[A, CC[_], C] {
  def groupByTo[MC[_, _], K](factory: Factory[(K, CC[A]), MC[K, CC[A]]])(f: A => K): MC[K, CC[A]]
}

The motivation would be to be able to put the results in a TreeMap without computing an intermediate Map, for instance:

users.groupByTo(TreeMap)(_.age)

BalmungSan commented 3 years ago

@julienrf oh that is also a good idea! I think both could be done in the same operation and leaving the normal Map as a default, something like:

def groupByTo[CC[_], MC[_, _], K](factory: Factory[A, CC[A]], mapFactory: MapFactory[MC] = Map)(key: A => K): MC[K, CC[A]]

And I think it is absolutely worth it. I may have a couple of errors in the signature but as soon as #23 is merged I will give it a try to this one (even if it isn't formally accepted by then, unless it was formally rejected).

NthPortal commented 3 years ago

I do think we should try to add as few ops that allow the most things, even if you have to specify a few more parameters in some cases (e.g. identity)

BalmungSan commented 3 years ago

Ok, I started to prototype this and I have a small API problem.

I defined the method like this:

def groupMapTo[K, B, C1, MC](key: A => K)(f: A => B)
                            (colFactory: Factory[B, C1])
                            (mapFactory: Factory[(K, C1), MC] = immutable.Map): MC = {

Which works great if you use it like this:

data.groupMapTo(_.country)(_.user)(colFactory = SortedSet)(mapFactory = SortedMap

However, this has two "problems"

The most important one, If I just want a normal Map then I need to add an empty parenthesis at the end (), which looks quite ugly.
I believe it feels more natural to put the factories first and then the key and mapping functions; especially because the mapping function can be quite big.

I originally tried to define the function like this:

def groupMapTo[K, B, C1, MC](colFactory: Factory[B, C1], mapFactory: Factory[(K, C1), MC] = immutable.Map)(key: A => K)(f: A => B): MC = {

However, the problem here is type inference. (Scala 3 allows to at least have both factories in the same last group, mitigating problem 1)

Does anyone have any ideas?

BalmungSan commented 3 years ago

What do you all think of an API like this:

trait GroupMap[K, V, CC[_], MC[_, _]] {
  def collectValuesAs[CC1[_]](factory: Factory[V, CC1[V]]): GroupMap[K, V, CC1, MC]
  def collectResultsAs[MC1[_, _]](factory: Factory[(K, V), MC1[K, V]]): GroupMap[K, V, CC, MC1]

  def result: MC[K, CC[V]]
  def reduce(reduce: (V, V)): MC[K, V] // Optional.
}

def groupMapTo[K, V](key: A => K)(f: A => V): NextIterableOnceOpsExtensions.GroupMap[K, V, CC, Map] = ???

So it can be used like this:

data
  .groupMapTo(_.country)(_.user)
  .collectValuesAs(SortedSet)
  .collectResultsAs(SortedMap)
  .result

This way in the future all the current methods like groupBy or groupMapReduce can be implemented in terms of this. This also opens the possibility of new methods like groupReduce and getting the results as a SortedMap.

julienrf commented 3 years ago

This looks promising!

scala / scala-library-next

Add groupByTo and groupMapTo to IterableOnceOps #51