twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.51k stars 708 forks source link

partitioner sometimes violates the n + 1 rule #1804

Open johnynek opened 6 years ago

johnynek commented 6 years ago
[info] - the total number of steps is not more than cascading *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: 5 was not less than or equal to 3
[info]     Location: (WritePartitionerTest.scala:101)
[info]     Occurred when passed generated values (
[info]       arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(Mapped(CoGroupedPipe(Pair(IdentityReduce(scala.math.Ordering$Int$@1457f692,WithDescriptionTypedPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(FlatMapped(WithDescriptionTypedPipe(ForceToDisk(WithDescriptionTypedPipe(Fork(WithDescriptionTypedPipe(Fork(EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),org.scalacheck.GenArities$$Lambda$441/1591575691@c8f3e1d),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),org.scalacheck.GenArities$$Lambda$441/1591575691@261ad18c),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),None,List(),ReflexiveEquality()),IdentityReduce(scala.math.Ordering$Int$@1457f692,WithDescriptionTypedPipe(WithDescriptionTypedPipe(MapValues(WithDescriptionTypedPipe(FlatMapped(SourcePipe(com.twitter.scalding.source.FixedTypedText(uMvxarCxpAubxith6fe8axecny)),org.scalacheck.GenArities$$Lambda$441/1591575691@18de4736),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),org.scalacheck.GenArities$$Lambda$441/1591575691@7a77d7b0),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),None,List(),ReflexiveEquality()),<function3>)),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))
[info]     )

It would be nice to fix this, and maybe even get it take exactly the same number of steps as cascading.

johnynek commented 6 years ago

another example:

[info] - the total number of steps is not more than cascading *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: 4 was not less than or equal to 3
[info]     Location: (WritePartitionerTest.scala:101)
[info]     Occurred when passed generated values (
[info]       arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(FlatMapped(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(WithDescriptionTypedPipe(ForceToDisk(EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),List((ctse,false))),WithDescriptionTypedPipe(Fork(EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),org.scalacheck.GenArities$$Lambda$441/1312778964@9247412),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))
[info]     )
johnynek commented 6 years ago

another:

[info] - the total number of steps is not more than cascading *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: 4 was not less than or equal to 3
[info]     Location: (WritePartitionerTest.scala:101)
[info]     Occurred when passed generated values (
[info]       arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(IterablePipe(List(0, 1358996629, -1, 1176242286, -799433509, 1695053710, -1, 2147483647, 2147483647, 562222783, 2147483647, -1, 1215429577, 1037781522, 2147483647, -1, 1167458391, 0, 2147483647, 2147483647, 2147483647, 1, 1, -2147483648, 2147483647, 1962348912)),WithDescriptionTypedPipe(MergedTypedPipe(IterablePipe(List(-738719769, 1, 1, 318544573, 729584317)),WithDescriptionTypedPipe(Fork(WithDescriptionTypedPipe(Fork(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(SourcePipe(com.twitter.scalding.source.FixedTypedText(wkprxxmjzdqdcsvcqzB)),EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))
[info]     )
johnynek commented 6 years ago

Looks like this still rarely fails:

[info] - the total number of steps is not more than cascading *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: 4 was not less than or equal to 2
[info]     Location: (WritePartitionerTest.scala:102)
[info]     Occurred when passed generated values (
[info]       arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(MergedTypedPipe(WithDescriptionTypedPipe(Filter(IterablePipe(List(-2147483648, 283231238, -2147483648, -1945810342, 0, 0, 915272271, 2147483647, -576715933, 287388012, -1, -1749491001, -1, -1046239738, 0, -272081409, 57139275, -434808636, -911187964, -126527959, 2147483647)),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),WithDescriptionTypedPipe(Fork(EmptyTypedPipe),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),WithDescriptionTypedPipe(WithDescriptionTypedPipe(WithDescriptionTypedPipe(ForceToDisk(EmptyTypedPipe),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),List((hssiml2nqfch5jbI2fpctMdiunkgqoidshq,false))),List((fxg6nrzx,false)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))
[info]     )
johnynek commented 6 years ago
[info] - the total number of steps is not more than cascading *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: 3 was not less than or equal to 1
[info]     Location: (WritePartitionerTest.scala:102)
[info]     Occurred when passed generated values (
[info]       arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(FlatMapped(WithDescriptionTypedPipe(Fork(WithDescriptionTypedPipe(MergedTypedPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(ForceToDisk(EmptyTypedPipe),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),SourcePipe(com.twitter.scalding.source.FixedTypedText(fubkmllkDpjlekh6lnwjqcdwaQrwSr8J3Rq0))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),WithDescriptionTypedPipe(TrappedPipe(WithDescriptionTypedPipe(MergedTypedPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(Mapped(EmptyTypedPipe,<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),WithDescriptionTypedPipe(ForceToDisk(EmptyTypedPipe),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),com.twitter.scalding.source.FixedTypedText(azyydPsd3fykcqnh),Single(com.twitter.scalding.TupleGetter$IntGetter$@3db992f4)),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R$class.map(Gen.scala:237),true)))
[info]     )
johnynek commented 6 years ago

Here is another example of this:

[info] - the total number of steps is not more than cascading *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: 3 was not less than or equal to 1
[info]     Location: (WritePartitionerTest.scala:101)
[info]     Occurred when passed generated values (
[info]       arg0 = WithDescriptionTypedPipe(MergedTypedPipe(WithDescriptionTypedPipe(Filter(WithDescriptionTypedPipe(ForceToDisk(EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),org.scalacheck.GenArities$$Lambda$441/1121672652@3ed6ad6a),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),WithDescriptionTypedPipe(FlatMapped(WithDescriptionTypedPipe(ForceToDisk(EmptyTypedPipe),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),scala.Function1$$Lambda$10/1663166483@19568812),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))
[info]     )
johnynek commented 6 years ago

the past two had ForceToDisk(EmptyTypedPipe). I wonder if that causes an issue for the partitioner.