ooyala / spark-jobserver

REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver/spark-jobserver. This fork now serves as a semi-private repo for Ooyala.
Other
344 stars 135 forks source link

Get Error while running on Cluster with Spark 1.0.0 #40

Open yyuzhong opened 10 years ago

yyuzhong commented 10 years ago

I try to run jobserver on my cluster. I just run the simple wordcount example. It is OK if I set master to local[4] in configuration file.

But if I try to change the master to spark://xxx.xxx.xxx.xxx:7077. I always get below errors.

Could anyone give some suggestion about how to run jobserver on actual cluster? Thanks!

{ "status": "ERROR", "ERROR": { "message": "Job aborted due to stage failure: Task 1.0:0 failed 4 times, most recent failure: TID 6 on host compute018.pvamumrihpc.edu failed for unknown reason\nDriver stacktrace:", "errorClass": "org.apache.spark.SparkException", "stack": ["org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)", "scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)", "scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)", "org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)", "scala.Option.foreach(Option.scala:236)", "org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)", "org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)", "akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)", "akka.actor.ActorCell.invoke(ActorCell.scala:456)", "akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)", "akka.dispatch.Mailbox.run(Mailbox.scala:219)", "akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)", "scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)", "scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)", "scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)", "scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"] }

bobbych commented 10 years ago

it works for me when using 1.0.2 (compiled from source), i had same issue when using 1.0.0