woocommerce / woocommerce-android

WooCommerce Android app
https://www.woocommerce.com/mobile
GNU General Public License v2.0
277 stars 135 forks source link

Avoid race conditions by collecting on main thread #12912

Closed malinajirka closed 2 days ago

malinajirka commented 1 week ago

Closes: #12565

~I haven't tested all the flows yet, I'm looking for initial thoughts about this solution @samiuelson @kidinov @AnirudhBhat. Do you think this change is safe for a beta fix or would be better to include it in the regular cycle?~

Description

The app currently incorrectly handles scenario when a required card reader update fails due to low battery during the connection flow.

This issue is caused by a race condition between the Connect VM <-> SW Update VM <-> Stripe SDK updates.

The connect VM emits navigation event to move to the SW Update VM when the Stripe SDK starts na update. However, the app also asynchronously listens to Card Reader connection status here. Before the app gets a chance to navigate to the SW Update Dialog/VM, the Stripe SDK emits NotConnected event which resets the UpdateStatus to Unknown => when the SW Update Dialog/VM finally loads and starts listening to updates, the only status update it receives is Unknown since the software update is no longer in progress - this results in an empty/white dialog being displayed (we don't display any UI for unknown state).

The solution I propose is simplify the code and remove the race condition by listening to Card Reader updates on the main thread - this method doesn't do any heavy operations and reasoning about events that are performed in the order that they were emitted is much easier.

Unfortunately, this solution doesn't work in 100% of cases. The navigation component is asynchronous and sometimes the status changes to Unknown even with the above "fix". ~I implemented a workaround in the follow up commit that dismisses the empty/white dialog in such case and adds a flag (uiLoaded) to the tracking event the app has been recording to indicate whether the dialog was dismissed before the UI was loaded. This avoids the UI glitch but users won't learn about the reason of the failure - we plan to keep an eye on the tracking event to understand how common is such scenario in production.~ In the end, we decided to add an artificial delay that propagates the state to the UI after 500ms, "ensuring" the navigation component gets a chance to navigate to the Update flow in time.

Steps to reproduce

  1. Enable Simulated reader in the dev settings.
  2. Set the Update behavior to LOW_BATTERY_ERROR.
  3. Attempt to collect a payment for on order.
  4. Notice an empty dialog appears - not always but vast majority of times.

Testing information

Repeat the steps to reproduce, and notice the user is informed their card reader needs to be charged.

Also test connection to both HW readers as well as TTP reader to verify there isn't any regression. Last but not least, test if the successful SW update flow works as expected.

The tests that have been performed

I tested the connect/payment/update flow on the simulated readers, including TTP. I also tested connection/payment flows on a HW reader.

Images/gif

Before After
Screenshot 2024-11-13 at 15 00 34 Screenshot_20241113-145307

Reviewer (or Author, in the case of optional code reviews):

Please make sure these conditions are met before approving the PR, or request changes if the PR needs improvement:

wpmobilebot commented 1 week ago
📲 You can test the changes from this Pull Request in WooCommerce-Wear Android by scanning the QR code below to install the corresponding build.
App Name WooCommerce-Wear Android
Platform⌚️ Wear OS
FlavorJalapeno
Build TypeDebug
Commit99fe7597d3832a3ac52b6044be79899256e08b9e
Direct Downloadwoocommerce-wear-prototype-build-pr12912-99fe759.apk
wpmobilebot commented 1 week ago

📲 You can test the changes from this Pull Request in WooCommerce Android by scanning the QR code below to install the corresponding build.

App Name WooCommerce Android
Platform📱 Mobile
FlavorJalapeno
Build TypeDebug
Commit99fe7597d3832a3ac52b6044be79899256e08b9e
Direct Downloadwoocommerce-prototype-build-pr12912-99fe759.apk
codecov-commenter commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 39.73%. Comparing base (1acd3ef) to head (99fe759). Report is 9 commits behind head on release/21.2.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## release/21.2 #12912 +/- ## =============================================== Coverage 39.73% 39.73% Complexity 5983 5983 =============================================== Files 1267 1267 Lines 73227 73228 +1 Branches 10042 10042 =============================================== + Hits 29094 29096 +2 + Misses 41567 41566 -1 Partials 2566 2566 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

kidinov commented 1 week ago

@malinajirka I still can reproduce this issue. Rarely but it still there. Here are the logs

2024-11-13 10:31:10.320 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Software update status: com.woocommerce.android.cardreader.connection.event.SoftwareUpdateStatus$InstallationStarted@a1e1365, thread: main
2024-11-13 10:31:10.326 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Software update status: Failed(errorType=BatteryLow(currentBatteryLevel=0.15), message=Update failed due to low battery), thread: main
2024-11-13 10:31:10.328 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Connection status in the VM: NotConnected(errorMessage=Update failed due to low battery), thread: main
2024-11-13 10:31:10.340 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Connection status: NotConnected(errorMessage=Update failed due to low battery) thread: main
2024-11-13 10:31:10.341 10844-10844 System.out              com.woocommerce.android.dev          I  Andrey Software update status: com.woocommerce.android.cardreader.connection.event.SoftwareUpdateStatus$Unknown@a88c7ec, thread: main

Logs.patch Where the logs were added ^

My understanding is that as navigation itself is async, there are still cases when updated Failed is still delivered to the ConnectionVM, not CardReaderUpdateViewModel.

Not sure what will be a good solution here; Maybe to unsubscribe from softwareUpdateStatus after ShowUpdateInProgress triggered?

malinajirka commented 1 week ago

I still can reproduce this issue. Rear but it still there. Here is the logs

Nice catch! The asynchronous aspect of the navigation component keeps biting us 😢.

Not sure what will be a good solution here; Maybe to unsubscribe from softwareUpdateStatus after ShowUpdateInProgress triggered?

Hmm, this won't help I believe. We change the status to Unknown within the CardReaderModule not the VM.

I'll keep thinking about a proper solution. In the meantime, perhaps, we could merge this PR + dismiss the dialog if the only state is Unknown - to avoid this broken looking behavior => and add tracking to learn about how common the scenario is. Wdyt @samiuelson @kidinov ?

An ugly workaround could be to keep the last two states in memory - if the last one is unknown, check the one before that. Having said that, I don't like that.

kidinov commented 1 week ago

@malinajirka

Hmm, this won't help I believe. We change the status to Unknown within the CardReaderModule not the VM.

My understanding is that it doesn't matter that we change it unless we change it before the failed status is received by CardReaderUpdateViewModel. Having said that, race conditions are possible with this approach, too, as we don't have any guarantee what comes first

I'll keep thinking about a proper solution. In the meantime, perhaps, we could merge this PR + dismiss the dialog if the only state is Unknown - to avoid this broken looking behavior => and add tracking to learn about how common the scenario is. Wdyt @samiuelson @kidinov ?

If you believe that finding a proper solution now requires more effort than adding tracking and then analyzing the results, then it makes sense.

malinajirka commented 1 week ago

My understanding is that it doesn't matter that we change it unless we change it before the failed status is received by CardReaderUpdateViewModel. Having said that, race conditions are possible with this approach, too, as we don't have any guarantee what comes first

That is correct, but unsubscribing from softwareUpdateStatus doesn't impact the order or speed of this at all so AFAICT such change won't make any difference. Am I missing something?

If you believe that finding a proper solution now requires more effort than adding tracking and then analyzing the results, then it makes sense.

Considering, the swUpdate flow is triggered automatically by Stripe as part of the Connection flow, I can think of only two proper solutions:

  1. Integrate SW Update flow as part of the connection flow even on the VM+UI layer => so we won't need to navigate anywhere.
  2. Implement a workaround - eg. buffer previous states or delay reset in the card reader module.

The first one is quite time consuming and the second one feels like an ugly workaround. So I went with the option of dismissing the dialog and tracking the state - I'll dive into the data after a couple months and based on the numbers, we can decide if we want to implement a proper solution (unless we come up with a better solution before that).

kidinov commented 1 week ago

That is correct, but unsubscribing from softwareUpdateStatus doesn't impact the order or speed of this at all so AFAICT such change won't make any difference. Am I missing something?

If we unsubscribe from it before we consume the "failed" event, then unless it's replaced by producers with "unknown" before "failed" event consumed in the Update VM, then it should work, IIC

The first one is quite time consuming and the second one feels like an ugly workaround. So I went with the option of dismissing the dialog and tracking the state - I'll dive into the data after a couple months and based on the numbers, we can decide if we want to implement a proper solution (unless we come up with a better solution before that).

👍

malinajirka commented 1 week ago

Thanks for the review @samiuelson !

After follow up conversation on Slack we decided to add a delay to the state reset instead of just dismissing the dialog. The benefit of using the delay is that all users will understand why the update failed (due to low battery). It's a hacky workaround, but the value/effort ration is good.

@kidinov Since you were able to reproduce the issue after the first "fix" (switch to the main dispatcher). Would you please mind reviewing the current state of the PR?

wpmobilebot commented 2 days ago

Version 21.2 has now entered code-freeze, so the milestone of this PR has been updated to 21.3.

kidinov commented 2 days ago

@malinajirka I missed the comment 😮‍💨

I'll retest it soon and update release notes too

malinajirka commented 2 days ago

@kidinov Sorry, my bad, I should have followed up - it slipped my mind that this hasn't been merged yet.

dangermattic commented 2 days ago
1 Message
:book: This PR contains changes to RELEASE-NOTES.txt.
Note that these changes won't affect the final version of the release notes as this version is in code freeze.
Please, get in touch with a release manager if you want to update the final release notes.

Generated by :no_entry_sign: Danger