Open gnarea opened 1 year ago
From what I reproduced, it was actually working and wrapping in a ServerConnectionException
. But now this requirements changed, since we added separate handling for SocketTimeoutException
.
Indeed. Though it's interesting that wsConnect()
isn't in the stack trace, so I guess that's why we got the uncaught exception. Maybe it's something we need to handle at the flow level?
Yup, it should be something like that. I'll investigate tomorrow.
From what I can tell it's being handled at the Flow level. Any steps to reproduce this issue? I haven't been able to.
I also noticed that the PDC interface is outdated. These throws come from inside the Flow, not from the method call: https://github.com/relaycorp/awala-jvm/blob/bb426493379c605eec0a0c628f8c1dab82f5f6ad/src/main/kotlin/tech/relaycorp/relaynet/bindings/pdc/PDCClient.kt#L54
From what I can tell it's being handled at the Flow level. Any steps to reproduce this issue? I haven't been able to.
Handled at the flow level by whom?
Not even disconnecting the router from the Internet did the trick?
I haven't experienced this myself. @mgulyaev10, any idea on how to reproduce this? I suspect it may be intermittent and happen rarely.
I also noticed that the PDC interface is outdated. These throws come from inside the Flow, not from the method call: https://github.com/relaycorp/awala-jvm/blob/bb426493379c605eec0a0c628f8c1dab82f5f6ad/src/main/kotlin/tech/relaycorp/relaynet/bindings/pdc/PDCClient.kt#L54
Thanks! Fixed in https://github.com/relaycorp/awala-jvm/pull/300
From what I can tell it's being handled at the Flow level. Any steps to reproduce this issue? I haven't been able to.
Handled at the flow level by whom?
Not even disconnecting the router from the Internet did the trick?
That only triggered a wrapped ServerConnectionException
that is being handled.
Looks like KTOR as an history of uncatchable exceptions that don't reference the app source code. Here's at least once that mentions SocketTimeoutException, although with a different stacktrace: https://youtrack.jetbrains.com/issue/KTOR-577/java.net.SocketTimeoutException-with-no-lines-referencing-my-code
Since our stacktraces also don't mention our own source code, we might not be able to handle the exception regularly. Maybe we can fix the underlying issue causing the timeout, which might not be possible. Other solutions look to be pretty last resort like CoroutineExceptionHandler
or Thread.setDefaultUncaughtExceptionHandler
, which don't sound great.
Thanks for looking into this @sdsantos!
That doesn't look good :(
I guess even if we wanted to use CoroutineExceptionHandler
or Thread.setDefaultUncaughtExceptionHandler
, we still wouldn't know for sure if that SocketTimeoutException
came from our Ktor client anyway, so it wouldn't really work.
I guess even if we wanted to use
CoroutineExceptionHandler
orThread.setDefaultUncaughtExceptionHandler
, we still wouldn't know for sure if thatSocketTimeoutException
came from our Ktor client anyway, so it wouldn't really work.
At least with CoroutineExceptionHandler
you can wrap it around a specific part of the code. But it could be catching more than we want yes.
That's a bit better then, but I'd be concerned with the code complexity to restart/end the WebSocket connection when that happens.
Since this has only happened once in Letro in the past 90 days, let's put a pin on this for now and revisit if it happens more often. Hopefully that'll buy enough time until Jet Brains fix this.
The following doesn't appear to have any effect (
SocketTimeoutException
is a subclass ofIOException
):https://github.com/relaycorp/awala-poweb-jvm/blob/380529fb1028ff22dc8da843223d66a9e90a26f7/src/main/kotlin/tech/relaycorp/poweb/PoWebClient.kt#L325-L327
Because Letro is crashing when the connection to the Awala server stops responding to pings: