twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.49k stars 703 forks source link

WritePartitioner step count law fails sometimes #1817

Open johnynek opened 6 years ago

johnynek commented 6 years ago

We really want O(1) steps per partition, since we want to make sure cascading can plan it fast, but still we don't understand why the law is failing. It could be our model of how cascading plans is off.

Here is a failure:

[info] - When we break at forks we have at most 2 + hashJoin steps *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
Reporter completed abruptly with an exception after receiving event: TestFailed(Ordinal(0, 1187),TestFailedException was thrown during property evaluation.
  Message: 4 was not less than or equal to 3 optimized: WithDescriptionTypedPipe(TrappedPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(HashCoGroup(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(Mapped(CoGroupedPipe(MapGroup(Pair(IdentityReduce(scala.math.Ordering$Int$@7ab78f27,WithDescriptionTypedPipe(FlatMapValues(Mapped(HashCoGroup(SourcePipe(com.twitter.scalding.source.FixedTypedText(source_0)),IdentityReduce(scala.math.Ordering$Int$@7ab78f27,WithDescriptionTypedPipe(MergedTypedPipe(FlatMapped(IterablePipe(List(0, 2147483647, 1544762017, 0, 624052887, 0, -287604905, 307396159, -610267734, -971337476, -875627901, 1, 2147483647, -1828436720, -2147483648, -1846923827, -1938708654, -1, 1, -1880693903, -123621097, -2147483648, 840086264, -1, 716705883, -2147483648, -1237321546, -2018664413, -471628389, -1327949610, 2147483647, 1470526490, -1650836342, 915790024, 1772020311, -615139303, -2114954200, -673076699, -601159290, 183995577, 1194220429, -1, 2056294449, -560847744, 0, 0, -2147483648, 1843138917, -2056707332, 324239878, 2038838732, 1278259345, 1120086645, 2147483647, 1904161096, 1, 0, 0, -2147483648, -1165218113, 209432443, 0, 1947467572, 1, 2147483647, 1242019707, 2147483647, 2072354530, 1709997299, -2147483648, -1908641464, 1781430029, -1, 265802944, 2147483647, 1380531264, -433867587, 0, -17033375, -2147483648, -1443528651, 1, 745727866, -845251200, 1, -1490936578, -1184499549, 0, 1672666737)),<function1>),FlatMapped(IterablePipe(List(1, -51928547, 1, -302353733, -181736864, -2147483648, -571666277, -1, -675270052, 36854174, 1847828467, -2147483648, -2147483648, 237525651, 1, 178499055, -2147483648, 338940843, -1054752338, -1, 829014631, 0, 1830694228, 1364013977, 0, -1, -1455941074, -85033653, -2147483648, -1210275416, -1937417356, 0, 867753776, 0, -1685381654, 70745480, -1783259469, -1376740730, 0, 1451049565, 1009239461, -2147483648, -1, -1, 773067303)),<function1>)),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),None,List(),ReflexiveEquality()),<function3>),<function1>),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),None,List(),ReflexiveEquality()),IdentityReduce(scala.math.Ordering$Int$@7ab78f27,WithDescriptionTypedPipe(SourcePipe(com.twitter.scalding.source.FixedTypedText(source_0)),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),None,List(),ReflexiveEquality()),<function3>),<function2>)),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),<function1>),List((com.stripe.dagon.FunctionK$class.apply(FunctionK.scala:11),true))),IdentityReduce(com.twitter.scalding.typed.TypedPipe$CrossPipe$$anon$2@c0d067cd,WithDescriptionTypedPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(SourcePipe(com.twitter.scalding.source.FixedTypedText(dmscynvQ1el)),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),<function1>),List((com.stripe.dagon.FunctionK$class.apply(FunctionK.scala:11),true))),List((com.stripe.dagon.FunctionK$class.apply(FunctionK.scala:11),true))),None,List(),ReflexiveEquality()),<function3>),List((com.stripe.dagon.FunctionK$class.apply(FunctionK.scala:11),true))),<function1>),List((com.stripe.dagon.FunctionK$class.apply(FunctionK.scala:11),true))),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),com.twitter.scalding.source.FixedTypedText(bvvG7hu51rucqPmfeleytvwep89DofrhhuKscdxO1rcopty5avyu7Fzrnpuq),Single(com.twitter.scalding.TupleGetter$IntGetter$@188ea6ee)),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))
johnynek commented 6 years ago

annother failure:

[info] - When we break at forks we have at most 2 + hashJoin steps
Reporter completed abruptly with an exception after receiving event: TestFailed(Ordinal(0, 1185),TestFailedException was thrown during property evaluation.
  Message: 4 was not less than or equal to 2
  Location: (WritePartitionerTest.scala:102)
  Occurred when passed generated values (
    arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(ForceToDisk(EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),WithDescriptionTypedPipe(Filter(WithDescriptionTypedPipe(Fork(WithDescriptionTypedPipe(Filter(WithDescriptionTypedPipe(Fork(EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),org.scalacheck.GenArities$$Lambda$441/597763888@64c4c3bd),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),org.scalacheck.GenArities$$Lambda$441/597763888@36a7692b),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(FlatMapped(IterablePipe(List(2147483647, -2147483648, -1945538304, -2147483648, -2147483648, -1369151060, -1913477688)),org.scalacheck.GenArities$$Lambda$441/597763888@27dabd65),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))