'web proof just broke' email gives no clues about how to fix (UI/doc issue)

nxg commented 3 years ago

When a web proof breaks, from Keybase's point of view, it sends a message

your previously-proven web identity XXX just broke. 
We've been checking it repeatedly,
and it's not working from our perspective.

Can I point out that 'it's not working' is the worst bug report ever? You wouldn't accept that in a github bug report.

I suggest that the message should indicate:

what's not working (ie, what actual error have you seen?)
where to go to debug this (ie, a pointer to documentation, perhaps including a checklist of things to examine)

In my particular case, it was because I'd changed the certificates, and was sending out just a certificate, and not the full chain. That's easily fixed, but I only worked out what needed fixed by searching for bug reports, and finding (very useful) #2914.

The email does include two suggestions:

If you deleted a public proof, remember that proofs must stay public so anyone can trust you without trusting the Keybase.io server.
If your identity has changed, you should revoke or replace the proof with the Keybase app.

But these are both rather generic suggestions, neither of which was relevant to my case (and those below), and neither of which says what's actually wrong, so neither is as useful as was doubtless intended.

Issue #3773 is a general one about documentation, but which covers this.
Issue #2878 is fundamentally caused by this issue, but manifesting as difficulty re-proving.
Issue #1711 is a different underlying cause of the same error.

These are all at least partly spurious bug reports, which create noise for you, and frustration for users; I stopped searching when I found these, so there may be other related ones. They can potentially all be closed by fixing this one.

chindraba-work commented 3 years ago

In this case, and in #2878, I don't think the keybase client can reliably know what is broken. An expired cert, an incomplete cert, a cert named wrong, the site no longer serving proper pages on port 443, the proof being moved, or who knows what else. All give the same basic failure to the curl process. How or why the website is not serving the proof is beyond the knowledge of keybase.

I presume it is possible to attempt to diagnose, in more detail, the issue. In some cases anyway. The reliability of the diagnoses is probably questionable too often to make the attempt worthwhile. In several cases, even on websites under my control, I've received certificate errors which my browser can, sometimes, give more detail on, and the odds of correctness are less than stellar.

Though it is automatic, the client checking the proof is no different than a human surfing to a web page and getting certificate error, or a 404 error. The human surfer has no clue what the problem is, only that it's not working.

nxg commented 3 years ago

Thanks for your thoughts.

OK, so it's curl that's doing the retrieval here, rather than a script or similar. In that case yes, I appreciate it's hard to be specific what the problem is. But it would at least be possible, in that case (I'm fairly sure), to say ‘there's a certificate problem’ as opposed to, for example, ‘the .well-known/keybase.txt file is missing’ (ie, a 404). Any clues would cut down the search space for the user.

If the only in-line test is a curl failure, then it might be possible for the process which sends out the emails to try a few further things, such as poking the origin server with openssl s_client, purely in order to give the user clues. This doesn't have to be 100% reliable.

Even if it were completely impossible to detect or report anything other than ‘it's not working’, then it would be nice to the user to include a link to a page suggesting well-known things to check (‘this often happens because...’). I'm sure you've amassed quite a few standard possibilities, even if only from bug reports here.

A really good solution to the problem would be to include in the keybase command a validator subcommand.

chindraba-work commented 3 years ago

My collection of possibilities are from my own experiences setting up, or repairing, sites on all kinds of hosting environments. The few listed are a mere sampling of the problems I've seen which came to mind as probable issued in the proof verification. Not having reviewed the code, or even the process internals, for verification I'm not sure what could be done to further diagnose the problem, with or without significant work.

My take is that I'd rather not have that information provided in the email, or other tooling. In the documents maybe. The failure of the proof, in your case and probably others here, is some kind of error in provisioning: certs, moved files, whatever. The failure could also be the results of Malory's activities and there's no surety that the informative email, with potentially useful data, could be sent to Malroy rather than myself.

As a flip side to the idea, if someone is proving they control a server, the practice of tracking down unknown errors can be good practice. I know I learn a tiny bit more every time something does not work. Granted, I wish it happened less often, and that I wasn't always having to compensate for previous admins' "abilities."

nxg commented 3 years ago

I appreciate that it might be difficult to provide useful diagnostic information here (I'm less convinced that one shouldn't, but that may be a separate issue).

Given that, I think my suggestion in this issue reduces to making the alerting email as useful as possible. The email is useful – I'd like to stress that it's very valuable for keybase to perform this check – but in this case it arrived out of the blue, from a service I'd partly forgotten setting up, about a week after me fumbling a certificate change (in a non-educational way). So from a user-experience point of view, it's useful to make the diagnostic process as straightforward as possible.

In the end, my clue to what I'd done wrong (doh!) was in this issues list, which I hope we can agree is not the optimal place for an end-user answer. A mere link within the email, to a ‘why verifications sometimes fail’ checklist in the documentation, would have set me on the right track very efficiently.

But the decision of whether any change is actually required is, of course, yours.

keybase / keybase-issues

'web proof just broke' email gives no clues about how to fix (UI/doc issue) #3994