ooyala / spark-jobserver

REST job server for Spark. Note that this is *not* the mainline open source version. For that, go to https://github.com/spark-jobserver/spark-jobserver. This fork now serves as a semi-private repo for Ooyala.
Other
344 stars 135 forks source link

Getting Ask timed out on on the latest docker image. #120

Open ttashi opened 8 years ago

ttashi commented 8 years ago

I am getting for following timeout message on invoking a job. Any Help will be appreciated

{ "status": "ERROR", "result": { "message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/xxxx#-1022892173]] after [20000 ms]", "errorClass": "akka.pattern.AskTimeoutException", "stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)", "akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)", "scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)", "akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)", "akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)", "akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)", "java.lang.Thread.run(Thread.java:745)"] }

I am using the velvia/spark-jobserver:0.6.2.mesos-0.28.1.spark-1.6.1 and at the bottom I have added the jobserver.conf.

Below is the stack strace.

ERROR .jobserver.JobManagerActor [] [] - About to restart actor due to exception: java.util.concurrent.TimeoutException: Futures timed out after [3 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:169) at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640) at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:167) at scala.concurrent.Await$.result(package.scala:107) at spark.jobserver.JobManagerActor$$anonfun$startJobInternal$1.apply$mcV$sp(JobManagerActor.scala:200) at scala.util.control.Breaks.breakable(Breaks.scala:37) at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:192) at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:144) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at ooyala.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at ooyala.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:26) at ooyala.common.akka.Slf4jLogging$class.ooyala$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:35) at ooyala.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:25) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at ooyala.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:24) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at ooyala.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Jobserver.conf

Template for Spark Job Server Docker config

You can easily override the spark master through SPARK_MASTER env variable

Spark Cluster / Job Server configuration

spark { # master = "local[4]" master = ${?SPARK_MASTER}

Default # of CPUs for jobs to use for Spark standalone cluster

job-number-cpus = 4

jobserver { port = 8090 jobdao = spark.jobserver.io.JobSqlDAO

context-per-jvm = true
context-init-timeout = 90s

sqldao {
  # Directory where default H2 driver stores its data. Only needed for H2.
  rootdir = /database

  # Full JDBC URL / init string.  Sorry, needs to match above.
  # Substitutions may be used to launch job-server, but leave it out here in the default or tests won't pass
  jdbc.url = "jdbc:h2:file:/database/h2-db"
}

}

predefined Spark contexts

contexts {

my-low-latency-context {

num-cpu-cores = 1 # Number of cores to allocate. Required.

memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.

}

define additional contexts here

}

universal context configuration. These settings can be overridden, see README.md

context-settings { num-cpu-cores = 2 # Number of cores to allocate. Required. memory-per-node = 1024m # Executor memory per node, -Xmx style eg 512m, #1G, etc.

# in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"

# uris of jars to be loaded into the classpath for this context. Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]

# If you wish to pass any settings directly to the sparkConf as-is, add them here in passthrough,
# such as hadoop connection settings that don't use the "spark." prefix
passthrough {
  #es.nodes = "192.1.1.1"
}

}

This needs to match SPARK_HOME for cluster SparkContexts to be created successfully

home = "/usr/local/spark" }

akka { remote.netty.tcp {

This controls the maximum message size, including job results, that can be sent

maximum-frame-size = 30 MiB

} }

spray.can.server {

uncomment the next line for making this an HTTPS example

ssl-encryption = on

idle-timeout = 210 s request-timeout = 200 s pipelining-limit = 2 # for maximum performance (prevents StopReading / ResumeReading messages to the IOBridge)

Needed for HTTP/1.0 requests with missing Host headers

default-host-header = "spray.io:8765" parsing.max-content-length = 400m }

client {

The time period within which the TCP connecting process must be completed.

Set to infinite to disable.

connecting-timeout = 10s }

CalebFenton commented 7 years ago

The main error is this:

java.util.concurrent.TimeoutException: Futures timed out after [3 seconds]

Try adjusting the timeout from 3 seconds to something higher here: https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server/src/spark.jobserver/JobManagerActor.scala#L217

It doesn't appear to be configurable anywhere unfortunately.