twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Add support for cogroups in beam-backend #1945

Closed nownikhil closed 3 years ago

nownikhil commented 3 years ago

In this change we are adding support for HashCoGroup and CoGroupedPipe. For evaluating HashCoGroup we are creating a ParDo transformation on the larger pipe with smaller pipe as side input. For evaluating CoGroupedPipe we are using the MultiJoinFunction to evaluate the final output iterator.

Also as part of this change we are doing a minor refactor for code > 100 lines.

TESTS: Added unit tests for both HashCoGroup and CoGroupedPipe.

CLAassistant commented 3 years ago

CLA assistant check
All committers have signed the CLA.

johnynek commented 3 years ago

small note:

Rebasing every time makes reviewers have to look through the entire commit again since we can't see what changed.

We use squash merge at the end, so there is no benefit to rebasing in terms of the final commit history. Would you be willing to just add commits so we can more easily review?

nownikhil commented 3 years ago

Yeah that makes sense, my bad. Will add a new commit next time.