RSocket Timeouts and Connection Errors

VaughnVernon commented 4 years ago

Got the following while testing lattice grid. It happened after stopping a node and bringing it back up again. After that error, a grid actor that was supposed to relocate back to the node that restarted, never made it through.

00:37:15.436 [pool-2-thread-3] WARN  io.vlingo.actors.Logger - Failed to create RSocket outbound channel for Address[Host[localhost],37371,OP], because java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 100ms in 'flatMap' (and no fallback has been configured)

The following is another one frequently seen in the logs. So far it does not seem to interfere with anything, but it could be that we are misusing RSocket, recovering from the error with a workaround, instead of properly sending the expected keep-alive signals.

io.rsocket.exceptions.ConnectionErrorException: No keep-alive acks for 90000 ms
    at io.rsocket.RSocketRequester.terminate(RSocketRequester.java:115) ~[rsocket-core-1.0.0-RC5.jar:na]
    at io.rsocket.keepalive.KeepAliveSupport.tryTimeout(KeepAliveSupport.java:110) ~[rsocket-core-1.0.0-RC5.jar:na]
    at io.rsocket.keepalive.KeepAliveSupport$ClientKeepAliveSupport.onIntervalTick(KeepAliveSupport.java:146) ~[rsocket-core-1.0.0-RC5.jar:na]
    at io.rsocket.keepalive.KeepAliveSupport.lambda$start$0(KeepAliveSupport.java:54) ~[rsocket-core-1.0.0-RC5.jar:na]
    at reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160) ~[reactor-core-3.3.0.RELEASE.jar:3.3.0.RELEASE]
    at reactor.core.publisher.FluxInterval$IntervalRunnable.run(FluxInterval.java:123) ~[reactor-core-3.3.0.RELEASE.jar:3.3.0.RELEASE]
    at reactor.core.scheduler.PeriodicWorkerTask.call(PeriodicWorkerTask.java:59) ~[reactor-core-3.3.0.RELEASE.jar:3.3.0.RELEASE]
    at reactor.core.scheduler.PeriodicWorkerTask.run(PeriodicWorkerTask.java:73) ~[reactor-core-3.3.0.RELEASE.jar:3.3.0.RELEASE]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_241]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_241]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_241]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_241]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_241]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_241]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_241]

VaughnVernon commented 4 years ago

Some initial feedback from @OlegDokuka of the RSocket team:

I would double check whether it is necessary to put io.vlingo.wire.fdx.bidirectional.rsocket.RSocketClientChannel#close on the line:

https://github.com/vlingo/vlingo-wire/blob/master/src/main/java/io/vlingo/wire/fdx/bidirectional/rsocket/RSocketClientChannel.java#L138

Florian-Schoenherr commented 4 years ago

Closed by #31

vlingo / xoom-wire

RSocket Timeouts and Connection Errors #30