relaycorp / awala-endpoint-android

High-level library for Android apps implementing Awala endpoints
Apache License 2.0
2 stars 1 forks source link

Attempt to connect to the private gateway whilst it's not running causes app to crash #356

Closed gnarea closed 11 months ago

gnarea commented 12 months ago

We're getting this in the Crashlytics for Letro:

tech.relaycorp.relaynet.bindings.pdc.ServerConnectionException - Failed to connect to http://127.0.0.1:13276/v1/nodes 

Fatal Exception: tech.relaycorp.awaladroid.RegistrationFailedException: Registration failed due to server
       at tech.relaycorp.awaladroid.GatewayClientImpl$registerEndpoint$2.invokeSuspend(GatewayClientImpl.kt:111)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
       at io.ktor.util.pipeline.SuspendFunctionGun.resumeRootWith(SuspendFunctionGun.kt:194)
       at io.ktor.util.pipeline.SuspendFunctionGun.access$resumeRootWith(SuspendFunctionGun.kt:15)
       at io.ktor.util.pipeline.SuspendFunctionGun$continuation$1.resumeWith(SuspendFunctionGun.kt:89)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
       at io.ktor.util.pipeline.SuspendFunctionGun.resumeRootWith(SuspendFunctionGun.kt:194)
       at io.ktor.util.pipeline.SuspendFunctionGun.access$resumeRootWith(SuspendFunctionGun.kt:15)
       at io.ktor.util.pipeline.SuspendFunctionGun$continuation$1.resumeWith(SuspendFunctionGun.kt:89)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
       at io.ktor.util.pipeline.SuspendFunctionGun.resumeRootWith(SuspendFunctionGun.kt:194)
       at io.ktor.util.pipeline.SuspendFunctionGun.access$resumeRootWith(SuspendFunctionGun.kt:15)
       at io.ktor.util.pipeline.SuspendFunctionGun$continuation$1.resumeWith(SuspendFunctionGun.kt:89)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
       at io.ktor.util.pipeline.SuspendFunctionGun.resumeRootWith(SuspendFunctionGun.kt:194)
       at io.ktor.util.pipeline.SuspendFunctionGun.access$resumeRootWith(SuspendFunctionGun.kt:15)
       at io.ktor.util.pipeline.SuspendFunctionGun$continuation$1.resumeWith(SuspendFunctionGun.kt:89)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
       at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
       at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
       at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
       at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
       at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
       at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
       at **kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)**

I think this is interesting:

Image

https://console.firebase.google.com/project/letro-android/crashlytics/app/android:tech.relaycorp.letro/issues/ac820cf3a5b2ddab51a4d8a54be99e07?time=last-seven-days&sessionEventKey=6555FBDF028B000139B82F831615775D_1880540908002673938

sdsantos commented 11 months ago

RegistrationFailedException is one of the exceptions we warn apps to handle. Meaning Letro should catch it when trying to register, and possibly retry automatically or tell the user to retry.

If this is happening often, then we have an issue with the gateway app that isn't starting the server quick enough (we wait 1 second), or it's failing to start it. Or we have an issue in the SDK, that it's not binding to the gateway app before attempting to connect to the server (but we can check the code clearly that it is).

Do you fell we should wait for more instances of the issue, or increment the waiting time?

gnarea commented 11 months ago

Thanks for looking into this @sdsantos!

I created the issue here because I misread the stack trace and I also thought we handled that exception in Letro, but it looks like we introduced a regression and no longer handle it.