rjagerman / glint

Glint: High performance scala parameter server
MIT License
168 stars 62 forks source link

Cannot find master #44

Closed cstur4 closed 8 years ago

cstur4 commented 8 years ago

16/07/18 17:26:19 ERROR yarn.ApplicationMaster: User class threw exception: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://glint-master@127.0.0.1:13370/), Path(/user/master)] akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://glint-master@127.0.0.1:13370/), Path(/user/master)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266) at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533) at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569) at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559) at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87) at akka.remote.EndpointWriter.postStop(Endpoint.scala:557) at akka.actor.Actor$class.aroundPostStop(Actor.scala:477) at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) at akka.actor.ActorCell.terminate(ActorCell.scala:369) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I have set a fixed IP in config file, but actor seems to seek in localhost

rjagerman commented 8 years ago

Thanks for the reporting the issue. Could you show me the part of the code that loads the configuration file? It should be something like this:

import com.typesafe.config.ConfigFactory
import glint.Client
val client = Client(ConfigFactory.parseFile(new java.io.File("/path/to/glint.conf")))

It is required to manually specify the configuration file to load, otherwise it will fall back to the default configuration which is localhost.

Also, could you post your 'glint.conf' file? The master hostname should be modified on line 8.

cstur4 commented 8 years ago

Thanks for your reply. The code starts client is

val client = Client(ConfigFactory.parseFile(new java.io.File("/data/bytehong/psftrl/glint.conf"))) And the config file:

# The master host name
host = "10.241.124.142"

# The master port
port = 13370

# The master actor name
name = "master"

# The master actor system name
system = "glint-master"

# The timeout during startup
. . .
cstur4 commented 8 years ago

I start master and server in console which also use the config file list above. And I try to connect to master

 val client = Client(ConfigFactory.parseFile(new java.io.File("/data/bytehong/psftrl/glint.conf")))
 val nd = client.vector[Double](srcFeatureMap.size)
 val zd = client.vector[Double](srcFeatureMap.size)

 val t = training_instances.mapPartitionsWithIndex((index, it) => {

  implicit val ec = scala.concurrent.ExecutionContext.Implicits.global
. . .
  val ndd = nd.pull(keys)
  val zdd = zd.pull(keys)
rjagerman commented 8 years ago

Hmm, that's interesting, your code looks good so it should work fine. We can try a minimal configuration file? Could you change the glint.conf so it contains only the following line and nothing else:

glint.master.host = "10.241.124.142"

This will leave all the other values at default and only change the master hostname.

cstur4 commented 8 years ago

I mistook the config file. After fixed, glint works as expected.