twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.48k stars 704 forks source link

Add Iterators.groupSequential method. #1867

Closed non closed 5 years ago

non commented 5 years ago

Given an iterator of key/value pairs, this iterator will lazily group items with the same key as a sub-iterator. In terms of types, the method does the following:

Iterator[(K, V)] => Iterator[(K, Iterator[V])]

Unlike other methods which will require loading much or all of the initial iterator into memory, this method only maintains one item worth of state (to "look ahead" and see what is coming next).

The most obvious use of this method for grouping data that has already been sorted. The expectation is that this will be useful to go between Scalding and Spark.

non commented 5 years ago

cc @johnynek