Open ttashi opened 8 years ago
The main error is this:
java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]
Try adjusting the timeout from 3 seconds to something higher here: https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server/src/spark.jobserver/JobManagerActor.scala#L217
It doesn't appear to be configurable anywhere unfortunately.
I am getting for following timeout message on invoking a job. Any Help will be appreciated
{ "status": "ERROR", "result": { "message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/xxxx#-1022892173]] after [20000 ms]", "errorClass": "akka.pattern.AskTimeoutException", "stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)", "akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)", "scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)", "akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)", "akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)", "akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)", "java.lang.Thread.run(Thread.java:745)"] }
I am using the velvia/spark-jobserver:0.6.2.mesos-0.28.1.spark-1.6.1 and at the bottom I have added the jobserver.conf.
Below is the stack strace.
ERROR .jobserver.JobManagerActor [] [] - About to restart actor due to exception: java.util.concurrent.TimeoutException: Futures timed out after [3 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:169) at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640) at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:167) at scala.concurrent.Await$.result(package.scala:107) at spark.jobserver.JobManagerActor$$anonfun$startJobInternal$1.apply$mcV$sp(JobManagerActor.scala:200) at scala.util.control.Breaks.breakable(Breaks.scala:37) at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:192) at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:144) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at ooyala.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at ooyala.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:26) at ooyala.common.akka.Slf4jLogging$class.ooyala$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:35) at ooyala.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:25) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at ooyala.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:24) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at ooyala.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Jobserver.conf
Template for Spark Job Server Docker config
You can easily override the spark master through SPARK_MASTER env variable
Spark Cluster / Job Server configuration
spark { # master = "local[4]" master = ${?SPARK_MASTER}
Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver { port = 8090 jobdao = spark.jobserver.io.JobSqlDAO
}
predefined Spark contexts
contexts {
my-low-latency-context {
num-cpu-cores = 1 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
}
define additional contexts here
}
universal context configuration. These settings can be overridden, see README.md
context-settings { num-cpu-cores = 2 # Number of cores to allocate. Required. memory-per-node = 1024m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
}
This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
home = "/usr/local/spark" }
akka { remote.netty.tcp {
This controls the maximum message size, including job results, that can be sent
} }
spray.can.server {
uncomment the next line for making this an HTTPS example
ssl-encryption = on
idle-timeout = 210 s request-timeout = 200 s pipelining-limit = 2 # for maximum performance (prevents StopReading / ResumeReading messages to the IOBridge)
Needed for HTTP/1.0 requests with missing Host headers
default-host-header = "spray.io:8765" parsing.max-content-length = 400m }
client {
The time period within which the TCP connecting process must be completed.
Set to
infinite
to disable.connecting-timeout = 10s }