twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.48k stars 704 forks source link

None.get in AsyncFlowDefRunner #1851

Closed fwbrasil closed 6 years ago

fwbrasil commented 6 years ago

I'm trying to isolate the issue. Basically ToWrite.optimizeWriteBatch(writes, phases) returns fewer elements than the writes input and AsyncFlowDefRunner. execute expects that the map will have one entry for each element in writes. See:

https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/AsyncFlowDefRunner.scala#L269 https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/AsyncFlowDefRunner.scala#L313

Stack trace:

Exception in thread "main" java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:347)
    at scala.None$.get(Option.scala:345)
    at com.stripe.dagon.HMap.apply(HMap.scala:53)
    at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$com$twitter$scalding$typed$cascading_backend$AsyncFlowDefRunner$$prepareFD$1$1.apply(AsyncFlowDefRunner.scala:324)
    at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$com$twitter$scalding$typed$cascading_backend$AsyncFlowDefRunner$$prepareFD$1$1.apply(AsyncFlowDefRunner.scala:304)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner.com$twitter$scalding$typed$cascading_backend$AsyncFlowDefRunner$$prepareFD$1(AsyncFlowDefRunner.scala:304)
    at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$5.apply(AsyncFlowDefRunner.scala:330)
    at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$5.apply(AsyncFlowDefRunner.scala:330)
    at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$validateAndRun$1.apply(AsyncFlowDefRunner.scala:242)
    at com.twitter.scalding.typed.cascading_backend.AsyncFlowDefRunner$$anonfun$validateAndRun$1.apply(AsyncFlowDefRunner.scala:242)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
johnynek commented 6 years ago

I have thought about not returning an HMap there but a list of type aligned items. That way we can statically see this won't happen. Not sure how this is happening now.

fwbrasil commented 6 years ago

the issue is actually due to a bad hashCode/equals of one our sources