main: make error messages more accurate

nonfungible-human commented 8 months ago

This PR is related to Issue 655.

When the user is logging into the web interface, an exception is caught if the login fails. However, the exception message is ignored and the "incorrect password" error message is displayed to the user in all cases. The result is that, unless the user really did enter the wrong password, the error message will be incorrect.

This error can be caused by attempting to connect to a node that is not running or is in the locked state. Even if the correct password is entered, the error message says it is incorrect. These conditions can be identified more accurately than they currently are and should be reported to the user more accurately as well.

This PR will add a error detail message for both LiT and LND, which can be toggled. So there will now be as many as three error message: a main error message (always displayed) and LiT and LND detail error messages, at least one of which will be displayed.

The login error messages will be determined as follows:

Create a custom error message based on the exception that was originally thrown. This will become the main error message.
Check the subserver statuses for LiT and LND.
If LiT reports an error, it will become the detail error message for LiT. Otherwise the detail error message for LiT will state that LiT is not running and provide further detail if possible.
If LND reports an error, it will become the detail error message for LND. Otherwise the detail error message for LND will state that LND is not running and provide further detail if possible.

If the subserver statuses could not be obtained, it will report a connection failure with detail in the LiT detail message.

If none of the above conditions are met, but the login still failed, it will do the following:

This will result in more accurate error messages being displayed on the web interface, which would make it easier for some users to diagnose login problems on their own. In cases where they still need assistance in diagnosing a problem, the more accurate error message will help to narrow down the cause earlier in the process.

levmi commented 8 months ago

Hi @nonfungible-human, thank you so much for the PR! We'll take a look at it and get back to you early next week with next steps. We're evaluating whether it makes sense to do a broader fix here :) It could make sense for us to loop you in on that if you're interested. We'll have more details to share early next week!

nonfungible-human commented 8 months ago

@levmi Okay. Earlier, I changed this PR to a draft because one of the ci tests fails as it is checking for the "incorrect password" message. The test that says it tests for an incorrect password really just returns an error if the node is not available. If the decision is made to move forward with this, that test would need to check for a different message. Also, in integrated mode a different message is returned when the node is locked.

levmi commented 7 months ago

So, we discussed this internally. First off, thank you so much for your desire and ability to contribute! We’re really excited to have an external contributor pushing on some of these things.

Our ideal path forward on this issue would be to include not only the changes that you’ve created here. But also, to take advantage of the new SubServer Status rpc to return even more verbose error messages to help the user and us debug any issues. To do that, you would hit the RPC call used in this function to attain the subserver status (i.e. (this._store.api.lit.listSubServerStatus()), https://github.com/lightninglabs/lightning-terminal/blob/f2cfca981c893152243d22a568689b624d324709/app/src/store/stores/subServerStore.ts#L39

Specifically, here is an example code snippet of what you’ll receive from the RPC call:

    "sub_servers": {
        "accounts": {
            "disabled": false,
            "running": true,
            "error": ""
        },
        "faraday": {
            "disabled": false,
            "running": true,
            "error": ""
        },
        "lit": {
            "disabled": false,
            "running": true,
            "error": ""
        },
        "lnd": {
            "disabled": false,
            "running": true,
            "error": ""
        },
        "loop": {
            "disabled": false,
            "running": true,
            "error": ""
        },
        "pool": {
            "disabled": false,
            "running": true,
            "error": ""
        },
        "taproot-assets": {
            "disabled": false,
            "running": true,
            "error": ""
        }
    }
}

The above snippet can return things that are relevant to the user’s inability to login or other unrelated issues. We believe that the flow we’d want is to return errors from the subserver status function if they are related to either the lnd subserver not starting or the lit service itself not starting. So, specifically, we’d return errors if the output within lit or lnd is either an error message, in which case we would return that error message. Or, if Running: false, in either of those buckets, then we’d return an error of “LND [lit] is not running.” If the RPC calls fails entirely, then we should return an error of “Unable to connect to LiT. Please restart litd and try again.”

Other error messages from that function (like poold not starting for instance) would not be relevant to user being unable to log in. So, in the case that there are no errors from lnd or litd, we could return the error messages that you’ve already outlined in this existing PR.

Does that make sense? Do you feel comfortable with implementing that type of flow?

nonfungible-human commented 7 months ago

Yes, that seems pretty straight forward, and a good way to proceed. I will change it to use the listSubServerStatus function and then modify this PR, possibly by tomorrow but more likely early next week.

levmi commented 7 months ago

Awesome, thank you so much! Don't hesitate to reach out with any questions!

nonfungible-human commented 7 months ago

@levmi I do have a question. I ran some initial tests for this and some of the error messages it returns are pretty long and seem to contain multiple messages. Should the entire message be display on the user interface in all of these cases?

I tested 3 common error conditions (wrong password, node locked, node not running) in both remote and integrated mode. The following results are what I receive for "lit" and "lnd" when calling listSubServerStatus.

remote mode, with lnd node running and unlocked, wrong password entered

The error message for lit could be used here, but it is rather long and somewhat ambiguous. For example, the only part of the message that indicates that the problem is an incorrect password is could not start macaroon service.

"lit"
{
    disabled: false
    error: "could not start Lit: could not start litd sub-servers: could not start macaroon service: unable to derive a shared secret with LND: rpc error: code = Unknown desc = permission denied"
    running: false
}

"lnd"
{
    disabled: false
    error: ""
    running: true
}

remote mode, with lnd node running but locked, correct password entered

Neither one has an error message, but lit.running == false. Therefore, we would use the generic error message LiT is not running.

"lit"
{
    disabled: false
    error: ""
    running: false
}

"lnd"
{
    disabled: false
    error: ""
    running: true
}

remote mode, with lnd node not running, correct password entered

Both lit and lnd have an error message. The message for lnd is pretty long. The message for lit could be used in this case.

"lit"
{
    disabled: false
    error: "could not start Lit: could not start LND"
    running: false
}

"lnd"
{
    disabled: false
    error: "could not set up LND clients: could not create LND Services client: error subscribing to lnd wallet state: lnd version incompatible, need at least v0.13.0-beta, got error on state subscription: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:10009: connect: connection refused\""
    running: false
}

integrated mode, with lnd node running and unlocked, wrong password entered

In this case, we would use the generic message LiT is not running.

"lit"
{
    disabled: false
    error: ""
    running: false
}

"lnd"
{
    disabled: false
    error: ""
    running: true
}

integrated mode, with lnd node running but locked, correct password entered

In this case, we would use the generic message LiT is not running.

"lit"
{
    disabled: false
    error: ""
    running: false
}

"lnd"
{
    disabled: false
    error: ""
    running: true
}

integrated mode, lnd not running

This case was not tested.

levmi commented 7 months ago

Hiya - doing some testing on our side and then will get back to you with a response on the above questions! I think what we want to make sure of is that if we're going to change the error messages in the UX to be more descriptive of the problem from an end user perspective that they aren't incorrect in certain cases. For instance, in the first case, with returning could not start macaroon service, it would obviously be better to actually return an error of wrong password or something along those lines. But, we wouldn't want to return that in the case where there's a separate error from wrong password that also returns could not start macaroon service. We're doing some tests on our side and then can validate or make recommendations. Sorry for the delay, it's a busy week for us with roundtables where we gather customer feedback. Reverting either tomorrow or Friday :)

nonfungible-human commented 7 months ago

@levmi Okay, sounds great. Those were exactly my concerns too, that we always use a reliable message to display something the user can easily understand. I will wait until I hear back from you.

ViktorTigerstrom commented 7 months ago

Hi @nonfungible-human! First of all, once again, thanks a lot for taking the time to contributing to this repo, it's really appreciated :pray::fire:!!!

I'll detail an answer to your questions one by one below.

First of all, I think I need to clarify that we would like to use both the error messages that you've currently added, plus the response from the subserver status rpc response. I think it would make sense create a combination of both showing the error messages that this PR has already added, and then showing the response from the subserver status rpc call separately. The response from the status rpc call should be seen as an "advanced error" message view, and would probably make sense to hide under a "More details" dropdown that can be expanded underneath the error message that's currently shown. For the average user, the expanded set of error messages that this PR has already added, will likely be good enough, but for advanced users the output from the status rpc response will also provide helpful troubleshooting info!

Too see what I mean by a dropdown, look at how extra information is shown when clicking the "Additional Options" dropdown on: https://terminal.lightning.engineering/connect/pair/

I'm also cc:ing @jamaljsr here, as UI:s isn't my specialty :sweat_smile:, so I'd like to get his approval if this sounds like a good idea before you actually implement it.

Either way, I definitely think we should create a combination of both the current error messages and the status rpc response, as neither of them can replace the need for the other.

remote mode, with lnd node running and unlocked, wrong password entered

The error message for lit could be used here, but it is rather long and somewhat ambiguous. For example, the only part of the message that indicates that the problem is an incorrect password is could not start macaroon service.

If we create the "advanced error message view" I think think it's ok that you include the full error message here. Just to clarify once more though, if we create a combination of both the current error messages and the status rpc response, the user will still get an "Incorrect password" in the main error message view to show the user what's wrong.

Just for future reference, as it might be helpful for you to understand our error messages more easily, when you see long nested error messages like this, it's actually most often the last part of the error message that most accurate. In this case it's permission denied, which correlates well to the password being incorrect.

remote mode, with lnd node running but locked, correct password entered

Yes, in cases like this it'd make sense to show the LiT is not running in the "advanced error message" view. Actually though, just to make the LiT is not running more helpful when we have no message in the error field, let's modify it to: LiT is not running - Ensure that the wallet is unlocked

Finally, in cases like this, you should also hit the walletLockedErr as the main error message right? That should help the user figure out what's wrong.

remote mode, with lnd node not running, correct password entered

If we add the "advanced error message" view, I actually think it makes sense to show the full error messages for both lit & lnd for this and other cases when there are multiple error messages. Just make sure that it's clear which error comes from which system.

This case actually shows a case where the "advanced error message" view will be very helpful for users, as I believe you'll hit the "noConnectionErr" for the main error message view, right?

integrated mode, with lnd node running and unlocked, wrong password entered

See the answer for "remote mode, with lnd node running and unlocked, wrong password entered" :).

integrated mode, with lnd node running but locked, correct password entered

See the answer for "remote mode, with lnd node running but locked, correct password entered".

integrated mode, lnd not running

This will lead to the status rpc call response containing "could not start Lit: could not start LND" for the lit error field.

Let me know if something is still unclear!

jamaljsr commented 7 months ago

@ViktorTigerstrom thanks for all of this great feedback. There's really not much to add.

I agree that the status message should be shown alongside the custom RPC error. The status error just provides more context into what may be causing the login to fail.

Another approach for placement of the status message is to use an icon with a tooltip on hover.

nonfungible-human commented 7 months ago

@ViktorTigerstrom and @jamaljsr, I have pushed a new commit to this branch and it works for me in all of the above cases. The css styles and error message text can easily be modified if necessary.

I did notice, during my final testing, that I was getting some "unable to fetch subserver status" error messages. I don't know if these messages are specific to my setup, because I am not using any of the other subservers, or if they are caused by something else. One of them said it could not find lit.macaroon, which is an error I thought I had seen in the past. None of the code I changed should affect the way subserver statuses are retrieved, so I am assuming the messages I am seeing are a normal condition. But we should make sure that, in addition to displaying accurate error messages, all functionality still works after a successful login, and that is hard for me to do because I am currently testing this login code on an unfunded node with no channels. Specifically, the messages I am talking about are the red boxes that appear on the right side of the screen immediately after a successful login. I see several of them and most of them say "unable to fetch subserver status" but they disappear faster than I can read all of them.

nonfungible-human commented 7 months ago

I made the suggested changes. I wasn't able to reproduce the empty detail message when an incorrect password is entered, but I put a check in place to prevent the button from being displayed if both detail messages are empty. However, at least one of them should contain text if there is an error.

Also, I removed the redundant prefixes Lit: and LND:. @ViktorTigerstrom had said to make sure that it's clear which error comes from which system, which is why I put them in there in the first place. Right now, if there is a message from both LiT and LND, it will display them both and it seems fairly easy to tell them apart. The case I know of where it display a message from both subsystems is when LiT is running in remote mode and LND is not running at all.

lightninglabs / lightning-terminal