relaycorp / awala-endpoint-android

High-level library for Android apps implementing Awala endpoints
Apache License 2.0
1 stars 1 forks source link

Investigate potential concurrency issue #329

Closed gnarea closed 1 year ago

gnarea commented 1 year ago

GatewayClient.receiveMessages() is behaving as if it's being called multiple times in the branch chore/add_pairing_functionality of Letro, even though it's only ever called once according to my logging/debugging.

This issue can be observed more easily with the logging introduced in the branch concurrency-issue: You'll see that the same message can be received multiple times. However, the logs of the Awala app locally and all the server logs confirm that only one of the messages should've been received.

For example:

2023-07-21 22:30:19.464 10892-10960 GatewayRepository       tech.relaycorp.letro                 I  Account creation completed: AccountCreatedDataModel(requestedAddress=gus1@nautilus.ink, assignedAddress=gus1@nautilus.ink)
2023-07-21 22:30:19.580 10892-10970 GatewayRepository       tech.relaycorp.letro                 W  Unsupported message type: application/vnd.relaycorp.letro.account-creation-completed
2023-07-21 22:30:19.649 10892-10967 GatewayRepository       tech.relaycorp.letro                 W  Unsupported message type: application/vnd.relaycorp.letro.account-creation-completed

Notice how the last two log entries correspond to the same message, received in different threads (10970 and 10967).

The logs on the Awala app confirm that the first message was sent once but the second message was sent twice to Letro:

2023-07-21 22:30:18.908 10411-10540 llectParcelsFromGateway tech.relaycorp.gateway               I  Collected parcel from Gateway 8677412019375083
2023-07-21 22:30:19.248 10411-10540 llectParcelsFromGateway tech.relaycorp.gateway               I  Collected parcel from Gateway 8677388669135053
2023-07-21 22:30:19.285 10411-10566 ParcelCollectionRoute   tech.relaycorp.gateway               I  Sending parcel 8677412019375083 to endpoint
2023-07-21 22:30:19.311 10411-10566 ParcelCollectionRoute   tech.relaycorp.gateway               I  Sending parcel 8677388669135053 to endpoint
2023-07-21 22:30:19.532 10411-10540 ParcelCollectionRoute   tech.relaycorp.gateway               I  Sending parcel 8677388669135053 to endpoint

By adding additional logs to the Awala app, I was able to discover that Letro is actually establishing many connections to the Awala app. For example, during the time span above, it established two concurrent connections within ~300ms:

2023-07-21 22:30:19.080 10411-11239 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:30:19.332 10411-10999 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection

However, my testing revealed that the number of concurrent connections can be much greater. For example:

2023-07-21 22:52:21.542 17333-17798 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.542 17333-17802 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.542 17333-17803 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.542 17333-17800 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.542 17333-17805 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.549 17333-17802 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.553 17333-17800 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.554 17333-17798 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.557 17333-17800 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection
2023-07-21 22:52:21.565 17333-17805 ParcelCollectionRoute   tech.relaycorp.gateway               I  New Parcel Collection connection

The multiple connections would explain why tge same message being received multiple times. The Awala app is sending all unacknowledged messages to the app, which is precisely what it should do -- the problem is that we now have multiple connections competing for them.

The strange thing about this issue is that GatewayClient.receiveMessage() does appear to be called once according to my logs, unless I'm missing something there. So I wonder if this is a problem with the coroutines configuration and/or the other flows in GatewayRepository, which is causing the flow to be consumed multiple times.

I compared how we do the "receiveMessages()" call in Awala Ping and the main difference appears to be that in Letro we're calling that method throughout the lifetime of the app, whereas on Ping we call it when needed. So maybe that'd explain why we don't experience that issue in Ping.

How to reproduce

You'd need two phones with the branch above of Letro installed. Then:

  1. Phone 1: Create an id like foo-1@nautilus.ink.
  2. Phone 2: Create an id like bar-1@nautilus.ink.
  3. Phone 1: Request to connect with bar-1@nautilus.ink.
  4. Phone 2: Request to connect with foo-1@nautilus.ink.
  5. Check the logs.

Make sure to use use different ids every time you run the test above.

github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version 1.13.22 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: