Open dasch opened 7 years ago
I feel we're already well covered with the existing Partition
method.
We can compose the various use cases above using the Scala stdlib,
for example matching the first in a list of predicates would be:
val predicates: List[A => Boolean]
val collection: PCollection[A] = ...
collection.apply(Partition.of(predicates.size, new PartitionFn<A>() {
def partitionFor(value A, int numPartitions) = predicates.indexWhere(_(value))
}))
However we could easily add a nice sugar to CollectionOps in order to prevent needing an inner class.
I definitely feel that tuples are the right approach here.
Inspired by a similar method in Kafka Streams, this method allows splitting a PCollection into several collections based on a list of predicates. The first predicate that returns true for an element determines which collection the element is sent to. A runtime error is thrown (:sad-face:) if no predicates match, so it's a good idea to have a catch-all at the end, if necessary.
There's also another variation,
branchMap
that uses partial functions rather than predicates – the first partial function that is defined for an element will be called with the element, and the returned value will be placed in the respective PCollection.We could also have a version that accepts n functions of the form
A => Option[B]
– the first one that evaluates toSome(x)
for an element would have its respective collection havex
written to it. Something like:Example
Tasks