pendulum-chain / vortex

1 stars 0 forks source link

Don't treat failure as a dead end state #160

Closed TorstenStueber closed 1 week ago

TorstenStueber commented 1 week ago

Currently any kind of error in the flow will lead to the failure state, which is an unrecoverable dead end and the user can only start over.

However, many/most errors are only transient: either there is a bad internet connection or some action timed out because a transaction took longer to be included in a block.

For that reason we should not treat errors as unrecoverable by default. Ideally we would distinguish errors into recoverable and unrecoverable errors, but in order to provide a quick solution, we should treat every error as recoverable by default.

TODO

TBD

  1. shall the user stay on the progress screen or shall we display the error message and the two options on the failure screen?
  2. what error message to display?
TorstenStueber commented 1 week ago

Hey team! Please add your planning poker estimate with Zenhub @b-yap @bogdanS98 @ebma @gianfra-t @Sharqiewicz

vadaynujra commented 1 week ago

@TorstenStueber for the TBD items:

  1. Given we don't consider the transaction 'failed' at this point (in most cases the error is recoverable), and so we shouldn't conclude to the user that their transaction has failed. TLDR; user stays on the progress screen.
  2. The error message should give at least some information to the user, but should not be too hard to understand (techy / jargony). Depending on which stage the user is on, can we add a line saying 'taking longer than expected' and then the 2 options? For the internet connection error it should ideally say something on the lines of 'Unstable internet connection detected'

But the above may not be enough. In a case where the user has bad connection, how will the app even show a message / options to the user? Wouldn't that also require an internet connection? If we are able to show the 2 options to the user, and they choose 'restart' wouldn't that also require an internet connection to take effect? If you are assuming that the internet connect is just unstable i.e. comes and goes but isn't down for much time, and therefore the options would be shown to the user and their selection would take effort, can't the 'Continue' option be executed without requiring a user input?

Let's take another example of a transient error - gas requirements change (increase) resulting in the transaction not being accounted for in the expected block, are we also planning to give user the options to choose from, instead of just continuing with the funds flow, if that's an option?

In general, the user has already signed transactions expecting an off ramp to their bank account and so if the continue option is available to them (even after the app encountered an error), the app should already go ahead with that. If continuing the flow is not an option at all (unrecoverable error), then its probably a bad idea to let the user do the offramp again - because it is probably going to get stuck, again.

TorstenStueber commented 1 week ago

@vadaynujra there are too many failure cases that we don't have a good grasp of at this point in time. Internet downtime is just one of them. And the user does not require internet for the error message to be shown.

We could make the Continue option be executed by default in the background, but then how to give the Restart option to the user? Just show the "Restart" button once the first error occurs even if we try a continue in the background? This would be an unnecessary call to action as it would be the only button.

Gas requirements change is a transient problem. But let's keep it simple and don't distinguish between all different error cases.

We (our code) doesn't know whether continue is a valid option or not. It depends on the error, but as stated in the ticket, for simplicity let's treat every error as a recoverable error.

Nevertheless, as discussed today, the user needs to be given the option to restart from scratch. So as I asked above: how to do this reasonably if the "Continue" action is executed automatically in the background?

vadaynujra commented 1 week ago

We (our code) doesn't know whether continue is a valid option or not. It depends on the error, but as stated in the ticket, for simplicity let's treat every error as a recoverable error.

While this is a fair assumption, don't we need to define (for ourselves) at what point we call an error unrecoverable? Is a transient error that has been fixed after XX minutes, a unrecoverable error? Is there a different criteria?

We could make the Continue option be executed by default in the background, but then how to give the Restart option to the user?

What is the value of giving the user the option to restart, when the flow has a known error (even if it transient)? That clearly increases the chances of the user facing the error again (case in point my transactions last Friday and Monday, Florian's transactions back to back this morning). Rather we should quickly learn about the error (ideally before the user reaches out to us), and:

TorstenStueber commented 1 week ago

don't we need to define (for ourselves) at what point we call an error unrecoverable?

We will need to spend more time eventually to distinguish all kind of error cases. For now the easiest is to let the user decide.

@vadaynujra your whole second part of the comment is not relevant for this ticket. If there is a bug in the code, we need to fix it. Sure. This ticket here is not about bugs but about transient problems/errors that are not in our control.

TorstenStueber commented 1 week ago

I will close this ticket, it has been implemented as part of https://github.com/pendulum-chain/vortex/pull/159.