twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Default to forcing before hashJoin on cascading 3 #1777

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

@cwensel @rubanm @piyushnarang

This seems to fix the vast majority of the issues I see on cascading 3 being able to plan random graphs.

here is a log of an error I found (but could fix by forcing the hashjoin): https://www.dropbox.com/s/v835xo1w3c68sib/84327C.tgz?dl=0 (it is the large example in this PR).

I still see some errors very rarely like:

[info]   Cause: java.lang.IllegalStateException: too many captured primary elements
[info]   at cascading.flow.planner.iso.transformer.RemoveBranchGraphTransformer.transformGraphInPlaceUsing(RemoveBranchGraphTransformer.java:59)
[info]   at cascading.flow.planner.iso.transformer.RecursiveGraphTransformer.transformGraphInPlaceUsingSafe(RecursiveGraphTransformer.java:140)
[info]   at cascading.flow.planner.iso.transformer.RecursiveGraphTransformer.transform(RecursiveGraphTransformer.java:120)
[info]   at cascading.flow.planner.iso.transformer.RecursiveGraphTransformer.transform(RecursiveGraphTransformer.java:74)
[info]   at cascading.flow.planner.rule.RuleTransformer.performTransform(RuleTransformer.java:111)
[info]   at cascading.flow.planner.rule.RuleTransformer.transform(RuleTransformer.java:97)
[info]   at cascading.flow.planner.rule.RuleExec.performTransform(RuleExec.java:416)
[info]   at cascading.flow.planner.rule.RuleExec.performMutation(RuleExec.java:226)
[info]   at cascading.flow.planner.rule.RuleExec.executeRulePhase(RuleExec.java:178)
[info]   at cascading.flow.planner.rule.RuleExec.planPhases(RuleExec.java:125)
[info]   at cascading.flow.planner.rule.RuleExec.exec(RuleExec.java:86)
[info]   at cascading.flow.planner.rule.RuleSetExec.execPlannerFor(RuleSetExec.java:153)
[info]   at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:336)
[info]   at cascading.flow.planner.rule.RuleSetExec$3.call(RuleSetExec.java:328)

My last ditch idea is if cascading throws, rewrite the hashjoin into a cogrouped. That or, we could add a Config option for users to do this when their jobs fail.

Hopefully @cwensel can find a bug and fix this, or clearly describe to us what kind of graphs trigger this so we can avoid them.

johnynek commented 6 years ago

cc @non

johnynek commented 6 years ago

@piyushnarang can you take a quick look at this?

The tests have usually passed for me so far. I think there are still cases where cascading throws. I added a work-around of a config to disable hashjoins in those cases. If we see some, we can commit the graph as a test graph and move forward. This is not ideal, and I hope Chris Wensel can update the docs on what kinds of plans are allowed so we can avoid this, or perhaps fix a bug in the planner.

johnynek commented 6 years ago

Since this is green, I'm just going to merge it into the oscar/merge-develop-cascading3 branch which is itself a PR (into yet another PR. We are getting pretty recursive here).