swsnu / bdcsfall2014

0 stars 0 forks source link

Network connection issue #45

Open hanmanhui opened 9 years ago

hanmanhui commented 9 years ago

Currently we are getting this error

2014-12-16 14:42:28,214 WARNING reef.runtime.common.evaluator.task.TaskRuntime.run TASK:VectorSimComputeTask:14 | Caught an exception during Task.call().
java.lang.NullPointerException
        at org.apache.reef.io.network.impl.NSConnection.write(NSConnection.java:105)
        at org.snu.ids.reef.vectorsim.VectorSimComputeTask.send(VectorSimComputeTask.java:391)
        at org.snu.ids.reef.vectorsim.VectorSimComputeTask.sendPrefix(VectorSimComputeTask.java:411)
        at org.snu.ids.reef.vectorsim.VectorSimComputeTask.call(VectorSimComputeTask.java:188)
        at org.apache.reef.runtime.common.evaluator.task.TaskRuntime.runTask(TaskRuntime.java:271)
        at org.apache.reef.runtime.common.evaluator.task.TaskRuntime.run(TaskRuntime.java:132)
        at java.lang.Thread.run(Thread.java:724)

As I tested every thing before NSConnection.write() seems ok. Also before this message there seems to be some connection refused Warnings.

2014-12-16 14:41:58,210 WARNING reef.wake.remote.transport.netty.NettyMessagingTransport.open TASK:VectorSimComputeTask:14 | Connection refused. Retry 1 of 3

From this perspective I assume that the ComputeTask implemented is looking for not ready task so the connection is refused. Is this right? If it is, how can I wait on ComputeTask until every task is ready?

DifferentSC commented 9 years ago

Hi. I'm Gyewon Lee at CMSLab.

I think you can use BlockingEventHandler in wake. When driver starts, you can initialize BlockEventHandler with expected number of events and target event handler to be triggered. The target event handler should have generic type of Iterable<ActiveContext>, and BlockEventHandler's generic type should be ActiveContext. When a context is fully ready(GroupComm + NetworkService) you can pass the context to BlockingEventHandler, by calling onNext(), like blockEventHandler.onNext(activeContext). When BlockingEventHandler gets expected number of contexts, it triggers EventHandler with collection of ActiveContexts, so you can submit other tasks there. Then, you can count the number of active tasks in RunningTaskHandler in Driver and when you can ensure that every other task is running, you can submit ComputeTask with remaining contexts.

hanmanhui commented 9 years ago

@DifferentSC Thanks for the reply. I tried BlockingEventHandler today but it didn't fit to my usage. Is there any way to send message from driver to submitted tasks on specific point?

DifferentSC commented 9 years ago

@hanmanhui I'm not sure that's what you want, but you can get running tasks from driver's handler (registered by client to ON_TASK_RUNNING) so you can also store the all task of running. When all tasks are ready, you can send messages to task by send() method in RunningTask.

hanmanhui commented 9 years ago

@DifferentSC Thanks. combining two methods made it work.