twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

CascadeJob fails with None.get. #804

Closed markhibberd closed 10 years ago

markhibberd commented 10 years ago

When using CascadeJob to just aggregate other Job's validateSources falls over with an unsafe call to None.get.

The job would look something like this:

import com.twitter.scalding._

class ACascade(args: Args) extends CascadeJob(args) {
  def jobs = List(
    AJob(args)
  ,  BJob(args)
  )                                                                   
}

Running this falls over with:

Exception in thread "main" java.lang.Throwable: If you know what exactly caused this error, please consider contributing to GitHub via following link.
https://github.com/twitter/scalding/wiki/Common-Exceptions-and-possible-reasons#javautilnosuchelementexception
        at com.twitter.scalding.Tool$.main(Tool.scala:154)
        at com.twitter.scalding.Tool.main(Tool.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:313)
        at scala.None$.get(Option.scala:311)
        at com.twitter.scalding.FlowStateMap$.validateSources(FlowState.scala:91)
        at com.twitter.scalding.Job.validate(Job.scala:215)
        at com.twitter.scalding.Tool.start$1(Tool.scala:106)
        at com.twitter.scalding.Tool.run(Tool.scala:132)
        at com.twitter.scalding.Tool.run(Tool.scala:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at com.twitter.scalding.Tool$.main(Tool.scala:140)
        ... 6 more

The work around is of course to override validate and prevent the validation from occurring.

 override def validate { /* workaround for scalding  */ }         

Should CascadeJob do this by default (given that it seems like the most common use for it)?

pcting commented 10 years ago

A fix landed in master a couple months back that addressed this specific issue:

https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/CascadeJob.scala#L23

johnynek commented 10 years ago

@markhibberd what version did you see this with?

markhibberd commented 10 years ago

@johnynek 0.9.0rc4. And it looks like it doesn't have the fix @pcting mentioned.

c-v-krishnakumar commented 10 years ago

I see the same issue too. Scalding version 0.9.0rc4 as well.

jcoveney commented 10 years ago

This should be in the currently released version