twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.49k stars 703 forks source link

scalding (not cascading) memory platform potential issue #1802

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

Scalacheck found this disagreement between scalding memory and cascading memory mode:

[info] - scalding memory mode matches cascading local mode *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: List(2, 2, 2) did not equal List(2, 2)
[info]     Location: (MemoryTest.scala:25)
[info]     Occurred when passed generated values (
[info]       arg0 = WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossValue(IterablePipe(List(-2147483648)),LiteralValue(2)),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),WithDescriptionTypedPipe(TrappedPipe(WithDescriptionTypedPipe(TrappedPipe(WithDescriptionTypedPipe(Mapped(WithDescriptionTypedPipe(CrossPipe(WithDescriptionTypedPipe(MergedTypedPipe(WithDescriptionTypedPipe(MergedTypedPipe(IterablePipe(List(1)),WithDescriptionTypedPipe(Fork(WithDescriptionTypedPipe(Filter(IterablePipe(List(1)),org.scalacheck.GenArities$$Lambda$441/1942591200@68118a69),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),WithDescriptionTypedPipe(Filter(IterablePipe(List(-1418921823)),org.scalacheck.GenArities$$Lambda$441/1942591200@59650892),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),IterablePipe(List(-21289660))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),com.twitter.scalding.source.FixedTypedText(djgiz8e0f6Laqdepo),Single(com.twitter.scalding.TupleGetter$IntGetter$@640dd78d)),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),com.twitter.scalding.source.FixedTypedText(nd2dwym),Single(com.twitter.scalding.TupleGetter$IntGetter$@640dd78d)),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))),List((org.scalacheck.Gen$R.map(Gen.scala:237),true))),<function1>),List((org.scalacheck.Gen$R.map(Gen.scala:237),true)))
[info]     )
johnynek commented 6 years ago

I tried to repro this with:

https://github.com/twitter/scalding/commit/c7166e09cea01b92f0f226d0259ddb4ba6e0fe72

but I couldn't.

I think one of three things is going on:

  1. it depends on the functions we plug in, and I am guessing wrong (but I am getting the 2, 2, 2 output so this seems somewhat unlikely).
  2. it hits a race condition in either cascading or the memory platform, and thus is hard to trigger
  3. the traps caught some error in cascading (like OOM) but so it silently gave bad data due to a transient error

Of these three, I think 3 is the most likely since there are several traps in this job and transient errors are somewhat common on travis. If it were a race condition, I feel we would have hit it before here (but you never know with a race).

Closing for now. We can reopen if we get more evidence.