twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Add a regression test (currently failing on cascading 3) #1780

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

I want to make sure any time we find a graph that fails on cascading 3 we add a test for it in develop. There is some chance it could be a graph that also fails on cascading 2.

piyushnarang commented 6 years ago

👍 Ruban, Cyrille and I added a few of these based on specific Cascading3 failures. I can try and consolidate them all here in a followup PR too.

johnynek commented 6 years ago

Interestingly, this actually passes on the cascading3 branch with the standard scalding optimization rules turned on.

So, I assume it is the DeferMerge rule + existing other checks that make it work. Perhaps we have to run the DeferMerge for our current tricks to work?

johnynek commented 6 years ago

I checked:

class Cascading3Bugfix extends OptimizationPhases {
  def phases = List(OptimizationRules.DeferMerge.orElse(OptimizationRules.DescribeLater),
    OptimizationRules.ForceToDiskBeforeHashJoin)
}

But that wasn't enough to fix the issue. So, it is some other of the full set of rewrites we do that it is saving it.