Closed Shusek closed 5 days ago
After debugging, I discovered the culprit.
Inside KtorWebSocketConnectionAdapter
when anyone tries to use outgoing the channel (eg. sendText method) when network connection is closing this will raise an exception.
Overall, this is consistent with the documentation:
Enqueue frame, may suspend if outgoing queue is full. May throw an exception if outgoing channel is already closed so it is impossible to transfer any message. Frames that were sent after close frame could be silently ignored. Please note that close frame could be sent automatically in reply to a peer close frame unless it is raw websocket session.
Summarizing the Krossbow should catch such errors or not allow their occurrence.
Thanks a lot for the report, and for the investigation.
Inside KtorWebSocketConnectionAdapter when anyone tries to use outgoing the channel (eg. sendText method) when network connection is closing this will raise an exception.
This seems appropriate.
Summarizing the Krossbow should catch such errors or not allow their occurrence.
I'm not sure I'm following. It seems legitimate to me that trying to send frames on a closed session throws an exception. Do you mean that an exception can occur due to automatically sent frames without programmer error?
I include sample code:
override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
viewLifecycleOwner.lifecycleScope.launch {
try {
provideStop().connect(
BuildConfig.SOCKET_URL,
host = null
).withJsonConversions(json = JsonParser)
// disconnect wifi connection
delay(30000)
} catch (throwable: Throwable) {
// the error is not caught
}
}
}
The reason is sending hearthbeat after socket is aborted and it causes the error:
java.net.SocketException: Software caused connection abort at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:119) at java.net.SocketInputStream.read(SocketInputStream.java:176) at java.net.SocketInputStream.read(SocketInputStream.java:144) at okio.InputStreamSource.read(JvmOkio.kt:93) at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:125) at okio.RealBufferedSource.request(RealBufferedSource.kt:206) at okio.RealBufferedSource.require(RealBufferedSource.kt:199) at okio.RealBufferedSource.readByte(RealBufferedSource.kt:209) at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.kt:119) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.kt:102) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.kt:293) at okhttp3.internal.ws.RealWebSocket$connect$1.onResponse(RealWebSocket.kt:195) at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:519) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.lang.Thread.run(Thread.java:923)
I/Process: Sending signal. PID: 23274 SIG: 9
In my opinion, heartbeat should not be sent when socket is corrupted. To catch such exceptions for this moment i can use CoroutineExceptionHandler inside launch method.
Got your point, thanks a lot for the sample code. It is indeed incorrect for Krossbow to try to send heart beats when the session is already closed. I will look into that.
The try
/catch
around the connect
call only works for catching exceptions during the connection. Once the session is connected, other exceptions will be thrown through subscription flows, or when calling session methods like send
. They can also be accessed via an instrumentation if one is defined in the Krossbow config, if you need a global way of getting "informed". So this part is kind of expected.
The worst of this bug is it that reconnect mechanism does not recognize such a case. Even when you call withAutoReconnect
and something will cut your wifi connection AutoReconnect it will not work.
I'm confused, does this happen with Ktor or OkHttp? The stack trace points to OkHttp (without any Krossbow methods), but you mention Ktor's outgoing
channel.
Also, it'd be great to be able to reproduce this outside of Android's environment, so we could write a test case for it.
Last but not least, do you reproduce the issue in Krossbow 3.1.0?
Yes, it is reproducible on 3.1.0. KTOR library works like a proxy and has a lot net engine inside (OkHttp, Jetty etc.). In my case i use okHttp like that
val ktorClient = HttpClient(OkHttp) {
install(WebSockets)
}
But for me it shouldn't matter, probably it can be reproduced on the plain OkHTTP without ktor.
sendText
is called by HearthBeat action sendHeartBeat
and exception invokes in this method:
override suspend fun sendText(frameText: String) {
wsSession.outgoing.send(Frame.Text(frameText))
}
If I disable HearthBeat action SocketException in this case it does not appear anymore.
Simplest hack-workaround for that is global handler can also be added and ignore this exception like that:
val webSocketClient =
KtorWebSocketClient(httpClient).withAutoReconnect(
ReconnectConfig(
maxAttempts = Int.MAX_VALUE
)
)
val client = StompClient(webSocketClient, StompConfig().apply {
heartBeat = HeartBeat(5.seconds, 13.seconds)
})
val errorHandler = CoroutineExceptionHandler { _, exception ->
ErrorLogger.logError(exception)
}
return client.connect(
url,
host = null,
sessionCoroutineContext = EmptyCoroutineContext + errorHandler
).withJsonConversions(json = JsonParser)
Thanks a lot for the details. Ok so you're using krossbow-werbsocket-ktor
with Ktor client using OkHttp engine, thanks.
sendText
is called by HearthBeat actionsendHeartBeat
and exception invokes in this method
If you confirmed this by debugging then that's ok, thanks. I just didn't get why the stacktrace you provided wouldn't include that. Indeed if an error on the socket occurs, the heart beat shouldn't be sent. I can think of 2 possibilities where this could happen in 3.1.0 (maybe there are more):
incoming
frames channel doesn't throw) - IMO this would be a bug in the websocket implementationTo prevent the first one, we could cancel the heart beat job early, but that would still be racing with the error bubbling. So it would still be possible that the heart beat kicks in while the error callbacks haven't reached the point where the session is aborted and all internal coroutines cancelled.
In order to prevent heart beat failures, we could check if the websocket is still open before attempting to send a heart beat (canSend
). In that case we probably also want to shutdown the session if it can't accept heart beats, in order to fix the problem 2 as well.
As a side note, both StompClient
's constructor and withReconnectConfig
accept lambdas for the configuration, so you don't have to write Config().apply { ... }
yourself:
val webSocketClient = KtorWebSocketClient(httpClient).withAutoReconnect {
maxAttempts = Int.MAX_VALUE
}
val client = StompClient(webSocketClient) {
heartBeat = HeartBeat(5.seconds, 13.seconds)
}
hello @joffrey-bion .
I am facing this exact same issue in version 7.0.0
This was resolved ?
I have my heartbeats set and I was testing a few things in my android application. My backend is using spring boot I took my application to the background for a while and put my cellphone to sleep mode.
I rightfully show on the logs of my spring boot backend that the heartbeats stopped appearing and the server destroyed my session.
When I bought my application back to live , the application crashed with the "Software caused connection abort".
I was able to hit this breakpoint before the app crashed
/Users/ahmed/.gradle/caches/modules-2/files-2.1/org.hildan.krossbow/krossbow-stomp-core-iosarm64/7.0.0/aa21a035596090a27ec803e96eb75a6a72dd104a/krossbow-stomp-core-iosarm64-7.0.0-sources.jar!/commonMain/org/hildan/krossbow/stomp/BaseStompSession.kt:83
private suspend fun shutdown(message: String, cause: Throwable? = null) {
// cancel heartbeats immediately to limit the chances of sending a heartbeat to a closed socket
heartBeaterJob?.cancel()
// let other subscribers handle the error/closure before cancelling the scope
awaitSubscriptionsCompletion()
scope.cancel(message, cause = cause)
}
The app control was at " heartBeaterJob?.cancel()
" after which it crashed
Yes this should have been fixed a long time ago. Thanks for reporting that it happens again, and for the details to help reproduce it. I reopened this issue to have another look.
Could you please share a little more details about which artifacts you were using? Are you using Ktor with the OkHttp engine? Or OkHttp directly?
Oh yes , I am actually using Ktor engine .
Thanks! And did you have a chance to try this experiment with the latest version of Krossbow?
yes indeed, my krossbow version is krossbow-version = "7.0.0"
If it's not too much hassle, could you please confirm that you can still reproduce with Krossbow 8.0.0? I don't think the changes in Krossbow 8 should change this specific behaviour, but I'd just like to be sure to avoid investigating a ghost :)
@joffrey-bion that is correct. Let me test this out with version 8 and revert back to you
@joffrey-bion surprisingly , that issue is not appearing in 8.0.0 despite your comment that you didn't change anything that was relevant to this issue .
Lets chalk this one up to some random act of God :D . You can close the issue while I monitor more closely and would revert back to you if i see some unexpected behavior .
Thanks again
Thank you so much for testing this! Ok I'll close for now, and we'll reopen if it pops up again.
Describe the bug
During a websocket connection when user turn off wifi connection he got "Software caused connection abort" exception. This exception is uncatchable so it cause application crash. I'm debugging which component is causing this error but it's pretty hard if anyone could confirm its occurrence it was great.
Reproduction and additional details 1) Connect to stomp by secure protocol 2) Turn off wifi 3) Crash occured
Context
Artifacts used:
krossbow-stomp-core
krossbow-stomp-jackson
krossbow-stomp-kxserialization
krossbow-websocket-core
krossbow-websocket-sockjs
krossbow-websocket-spring
krossbow-websocket-okhttp
krossbow-websocket-ktor