twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Fix timeout issues in OptimizationRulesTest #1787

Open johnynek opened 6 years ago

johnynek commented 6 years ago

I think the issue is that a.cross(b) takes a pipe of n and pipe of m and emits n * m elements. If you get lucky and do that a two times, and have some moderately sized inputs, you can really generate massive outputs.

I see a few solutions:

  1. make sure we don't put too many items in initially
  2. turn down the probability of cross, which is VERY rarely used for the same reason. Also, it is not fundamental and expressed in terms of other operators.
  3. put a .limit after each cross, which restricts the kinds of topologies we see.
johnynek commented 6 years ago

could be related to #1789