rsocket / rsocket-kotlin

RSocket Kotlin multi-platform implementation
http://rsocket.io
Apache License 2.0
552 stars 37 forks source link

Android app crashes after network error using TCP transport #202

Closed webfrea-k closed 2 years ago

webfrea-k commented 2 years ago

Newest version (0.14.3) does not reconnect if network is down.

Expected Behavior

When network is lost reconnectable should trigger reconnect attempts and recover if network is available again.

Actual Behavior

App crashes with the following log if wifi is turned off:

2021-12-21 12:41:46.525 22099-22156/? W/SDK: ⚠️ Unable to connect to WS. Retrying...
2021-12-21 12:41:46.525 22099-22156/? W/SDK: ⚠️ Network is unreachable
2021-12-21 12:41:46.530 22099-22158/? E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-14
    Process: my.namespace.demoapp, PID: 22099
    java.io.IOException: Software caused connection abort
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:421)
        at io.ktor.network.sockets.CIOReaderKt$readFrom$2.invoke(CIOReader.kt:129)
        at io.ktor.network.sockets.CIOReaderKt$readFrom$2.invoke(CIOReader.kt:128)
        at io.ktor.utils.io.ByteBufferChannel.writeAvailable(ByteBufferChannel.kt:1492)
        at io.ktor.utils.io.ByteBufferChannel.write$suspendImpl(ByteBufferChannel.kt:1522)
        at io.ktor.utils.io.ByteBufferChannel.write(Unknown Source:0)
        at io.ktor.utils.io.ByteWriteChannel$DefaultImpls.write$default(ByteWriteChannel.kt:99)
        at io.ktor.network.sockets.CIOReaderKt.readFrom(CIOReader.kt:128)
        at io.ktor.network.sockets.CIOReaderKt.access$readFrom(CIOReader.kt:1)
        at io.ktor.network.sockets.CIOReaderKt$attachForReadingDirectImpl$1.invokeSuspend(CIOReader.kt:108)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTaskKt.resume(DispatchedTask.kt:234)
        at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:190)
        at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:161)
        at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:397)
        at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl(CancellableContinuationImpl.kt:431)
        at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl$default(CancellableContinuationImpl.kt:420)
        at kotlinx.coroutines.CancellableContinuationImpl.resumeWith(CancellableContinuationImpl.kt:328)
        at io.ktor.network.selector.SelectorManagerSupport.handleSelectedKey(SelectorManagerSupport.kt:80)
        at io.ktor.network.selector.SelectorManagerSupport.handleSelectedKeys(SelectorManagerSupport.kt:60)
        at io.ktor.network.selector.ActorSelectorManager.process(ActorSelectorManager.kt:71)
        at io.ktor.network.selector.ActorSelectorManager.access$process(ActorSelectorManager.kt:18)
        at io.ktor.network.selector.ActorSelectorManager$process$1.invokeSuspend(Unknown Source:15)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:39)
        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)

Steps to Reproduce

Reconnectable configuration (was not changed from previous versions) in commonMain:

reconnectable(10, predicate = {
    warningLog("Unable to connect to WS. Retrying...")
    warningLog("${it.message}")
    delay(1000)
    true
})

Initial connection configuration:

val transport = TcpClientTransport(host, 8000)
val rsocker = connector(uniqueDeviceId, authToken.access_token).connect(transport)

Your Environment

kotlinVersion=1.6.10
coroutinesBaseVersion=1.5.1-new-mm-dev2
ktorVersion=1.6.2-native-mm-eap-196
rsocketVersion=0.14.3
serializationVersion=1.3.0
klockVersion=2.4.8

Maybe it has something to do with: https://github.com/rsocket/rsocket-kotlin/issues/150

olme04 commented 2 years ago

hey, could you add loggerFactory = PrintLogger.withLevel(LoggingLevel.DEBUG) to the connector configuration and paste logs, what happens after reconnect? At my side (just checked) reconnect works with both TCP and WS client on JVM. logs example

[DEBUG] (io.rsocket.kotlin.connection) Connection closed. Reconnecting... 
[DEBUG] (io.rsocket.kotlin.connection) Connection establishment failed, attempt: 0. Trying to reconnect... Error: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
    at io.ktor.network.sockets.SocketImpl.connect$ktor_network(SocketImpl.kt:38)
    at io.ktor.network.sockets.SocketImpl$connect$1.invokeSuspend(SocketImpl.kt)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
webfrea-k commented 2 years ago

Hej olme04, sure, here is the relevant log from android logcat: p.s. there are more logs (io.rsocket.kotlin.connection) Connection establishment failed, attempt: 0. Trying to reconnect... logs below, but the app crashed immediately after turning the wifi off.


    --------- beginning of crash
2021-12-22 06:38:05.396 31528-31628/my.android.namespace E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-4
    Process: my.android.namespace, PID: 31528
    java.io.IOException: Software caused connection abort
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:419)
        at io.ktor.network.sockets.CIOReaderKt$readFrom$2.invoke(CIOReader.kt:129)
        at io.ktor.network.sockets.CIOReaderKt$readFrom$2.invoke(CIOReader.kt:128)
        at io.ktor.utils.io.ByteBufferChannel.writeAvailable(ByteBufferChannel.kt:1492)
        at io.ktor.utils.io.ByteBufferChannel.write$suspendImpl(ByteBufferChannel.kt:1522)
        at io.ktor.utils.io.ByteBufferChannel.write(Unknown Source:0)
        at io.ktor.utils.io.ByteWriteChannel$DefaultImpls.write$default(ByteWriteChannel.kt:99)
        at io.ktor.network.sockets.CIOReaderKt.readFrom(CIOReader.kt:128)
        at io.ktor.network.sockets.CIOReaderKt.access$readFrom(CIOReader.kt:1)
        at io.ktor.network.sockets.CIOReaderKt$attachForReadingDirectImpl$1.invokeSuspend(CIOReader.kt:108)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTaskKt.resume(DispatchedTask.kt:234)
        at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:190)
        at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:161)
        at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:397)
        at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl(CancellableContinuationImpl.kt:431)
        at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl$default(CancellableContinuationImpl.kt:420)
        at kotlinx.coroutines.CancellableContinuationImpl.resumeWith(CancellableContinuationImpl.kt:328)
        at io.ktor.network.selector.SelectorManagerSupport.handleSelectedKey(SelectorManagerSupport.kt:80)
        at io.ktor.network.selector.SelectorManagerSupport.handleSelectedKeys(SelectorManagerSupport.kt:60)
        at io.ktor.network.selector.ActorSelectorManager.process(ActorSelectorManager.kt:71)
        at io.ktor.network.selector.ActorSelectorManager.access$process(ActorSelectorManager.kt:18)
        at io.ktor.network.selector.ActorSelectorManager$process$1.invokeSuspend(Unknown Source:15)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:39)
        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
2021-12-22 06:38:05.379 19258-27705/? V/NativeCrypto: Read error: ssl=0x76c0159288: I/O error during system call, Software caused connection abort
2021-12-22 06:38:05.398 797-883/? E/BatteryExternalStatsWorker: no controller energy info supplied for wifi
2021-12-22 06:38:05.398 797-883/? I/WifiService: requestActivityInfo uid=1000
2021-12-22 06:38:05.398 797-883/? I/WifiService: reportActivityInfo uid=1000
2021-12-22 06:38:05.398 797-883/? I/WifiService: getSupportedFeatures uid=1000
2021-12-22 06:38:05.402 31528-31631/my.android.namespace I/System.out: [DEBUG] (io.rsocket.kotlin.connection) Connection establishment failed, attempt: 0. Trying to reconnect... Error: java.net.ConnectException: Network is unreachable
2021-12-22 06:38:05.402 31528-31631/my.android.namespace I/System.out:     at sun.nio.ch.Net.connect0(Native Method)
2021-12-22 06:38:05.402 31528-31631/my.android.namespace I/System.out:     at sun.nio.ch.Net.connect(Net.java:465)
2021-12-22 06:38:05.402 31528-31631/my.android.namespace I/System.out:     at sun.nio.ch.Net.connect(Net.java:455)
2021-12-22 06:38:05.402 31528-31631/my.android.namespace I/System.out:     at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:687)
2021-12-22 06:38:05.402 31528-31631/my.android.namespace I/System.out:     at io.ktor.network.sockets.SocketImpl.connect$ktor_network(SocketImpl.kt:32)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.ktor.network.sockets.ConnectUtilsJvmKt.connect(ConnectUtilsJvm.kt:19)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.ktor.network.sockets.TcpSocketBuilder.connect(TcpSocketBuilder.kt:38)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.transport.ktor.TcpClientTransportKt$TcpClientTransport$6.connect(TcpClientTransport.kt:58)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.transport.ClientTransportKt$ClientTransport$1.connect(ClientTransport.kt:35)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.core.RSocketConnector.connectOnce(RSocketConnector.kt:49)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.core.RSocketConnector.access$connectOnce(RSocketConnector.kt:27)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.core.RSocketConnector$connect$3.invokeSuspend(RSocketConnector.kt:43)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.core.RSocketConnector$connect$3.invoke(Unknown Source:8)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.core.RSocketConnector$connect$3.invoke(Unknown Source:2)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.internal.ReconnectableRSocketKt$connectWithReconnect$state$1.invokeSuspend(ReconnectableRSocket.kt:40)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.internal.ReconnectableRSocketKt$connectWithReconnect$state$1.invoke(Unknown Source:8)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.internal.ReconnectableRSocketKt$connectWithReconnect$state$1.invoke(Unknown Source:4)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.SafeFlow.collectSafely(Builders.kt:61)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.AbstractFlow.collect(Flow.kt:221)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt__ErrorsKt.catchImpl(Errors.kt:156)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt.catchImpl(Unknown Source:1)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt__ErrorsKt$retryWhen$$inlined$unsafeFlow$1.collect(SafeCollector.common.kt:117)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt__ErrorsKt.catchImpl(Errors.kt:156)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt.catchImpl(Unknown Source:1)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt__ErrorsKt$catch$$inlined$unsafeFlow$1.collect(SafeCollector.common.kt:113)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.internal.ReconnectableRSocketKt$connectWithReconnect$$inlined$transform$1.invokeSuspend(Emitters.kt:223)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.internal.ReconnectableRSocketKt$connectWithReconnect$$inlined$transform$1.invoke(Unknown Source:8)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.internal.ReconnectableRSocketKt$connectWithReconnect$$inlined$transform$1.invoke(Unknown Source:4)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.SafeFlow.collectSafely(Builders.kt:61)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.AbstractFlow.collect(Flow.kt:221)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt__CollectKt.emitAll(Collect.kt:109)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.FlowKt.emitAll(Unknown Source:1)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at io.rsocket.kotlin.internal.ReconnectableRSocketKt$restarting$1.invokeSuspend(ReconnectableRSocket.kt:74)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.flow.internal.SafeCollector.invokeSuspend(SafeCollector.kt:41)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:39)
2021-12-22 06:38:05.403 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
2021-12-22 06:38:05.404 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
2021-12-22 06:38:05.404 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
2021-12-22 06:38:05.404 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
2021-12-22 06:38:05.404 31528-31631/my.android.namespace I/System.out:     at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
2021-12-22 06:38:05.404 31528-31631/my.android.namespace W/SRSDK: ⚠️ Unable to connect to WS. Retrying...
2021-12-22 06:38:05.404 31528-31631/my.android.namespace W/SRSDK: ⚠️ Network is unreachable
olme04 commented 2 years ago

TBH, I have no idea on what changed, which cause such an issue I've found similar ktor issues, may be they are relevant:

All marked as fixed, but who knows..

Also, one more idea, can you try to create tcp client with additional exception handler and send error message here, like this:

TcpClientTransport(host, port, context = CoroutineExceptionHandler { c, e -> println("TCPFAIL: $c -> $e") })
webfrea-k commented 2 years ago

With this exception handler, the app does not crash anymore.

I/System.out: TCPFAIL: [my.android.app.test$connectReactiveDataSource$job$1$invokeSuspend$$inlined$CoroutineExceptionHandler$1@a31df0c, CoroutineName(rSocket-tcp-client), StandaloneCoroutine{Cancelling}@879d055, Dispatchers.IO] -> java.io.IOException: Software caused connection abort

I'll leave this in, as it fixes the issue and after the connection is back it does work forward on. Thank you so much for checking and suggesting this.

Should I leave this issue open or do I close it?

olme04 commented 2 years ago

Cool, that it help's you! Let's leave issue open, but can you change title to smth like: 'Android app crashes after network error using TCP transport'. The issue is not in reconnect at all. I will think, what we can do here, to simplify configuration in future.

webfrea-k commented 2 years ago

Great. I've changed the title. Many thanks for your assistance once again :)

olme04 commented 2 years ago

No problem, Im here to help to solve any issue with rsocket-kotlin :) Also, if you have some ideas, on how we can improve usability or anything else with library, feel free to create an issue!

webfrea-k commented 2 years ago

Hello @olme04, I believe something similar is happening on iOS platform. It is not strictly network related, but I'll try to explain and if you think it should be a separate issue, I'll create a new one. On iOS platform the app crashes without any warnings (or recorded errors) when you lock the device (e.g. press power button to turn the screen off and lock the device), wait for a couple minutes and then unlock the device again. The only log (so far) is: Message from debugger: Terminated due to signal 13 My connection is established like this:

internal val connection: AtomicRef<RSocket?> = atomic(null)

val transport = TcpClientTransport(host, 8000, CoroutineExceptionHandler { coroutineContext, throwable ->
    warningLog("error: ${throwable.stackTraceToString()}")
})
connection.value = api.connector(uniqueDeviceId, authToken.access_token).connect(transport)

//when app goes to background this method is called
connection.value?.cancel(message = "Going to background...")

Am I doing something wrong? Is there any better way to disconnect via method call?

Let me know if it's a separate issue. Regards, Simon

olme04 commented 2 years ago

Hey, Im not so familiar with iOS development, but looks like it's related to sockets under the hood, so, ktor-network module from ktor. similar thing with flutter and sockets: https://github.com/tekartik/process_run.dart/issues/36 I think, it's better to create an issue in ktor youtrack

webfrea-k commented 2 years ago

Hey, interestingly adding signal(SIGPIPE, SIG_IGN) //Ignore signal really helps. Ktor issue can be found here. Many thanks!

webfrea-k commented 2 years ago

One more thing @olme04 :), what is the proper way to disconnect the connection? I've just noticed that RSocket.cancel(message = "Stopping...") does not stop underlying requestStream() requests (updates keep coming from the server). Maybe this is the reason for iOS crashes - canceling the rsocket scope does not actually stop receiving data?

olme04 commented 2 years ago

Looks like it's a bug in current implementation, as for now, RSocket.cancel should stop requester streams, but will not fail connection. I will investigate more later.

rezafaraji93 commented 1 month ago

I am facing the same issue. Where should I use the TcpClientTransport? @olme04 @webfrea-k