Closed tlazaro closed 2 years ago
We haven't tackled toIterableExecution and forceToDiskExecution yet. Still trying to get a successful run for a specific pretty large job.
Added some improvements to CoGroup so the TupleToKv doesn't show in the top level ui and rather lives inside CoGroup outer PTransform.
Added caching to BeamOp following @johnynek's advice, new implementations should by default be built using the cache preventing mistakes.
Added very shallow tests for the caching. Our goal should be testing the structure of the Pipeline in code, instead of visually in a Dataflow UI. The Pipeline DAG is a bit tedious to work with, the way would be to use the visitor pattern it has and perform mutations to build a better DAG instance where could perform better analysis.
Can't afford to do proper structural testing now, though I'm sure will come back as a problem later.