magic-wormhole / magic-wormhole-protocols

The documentation of the protocols powering Magic Wormhole
MIT License
26 stars 12 forks source link

Document rendezvous server error handling #15

Open piegamesde opened 3 years ago

piegamesde commented 3 years ago

At the moment, this part is heavily under-documented in the specification, only giving the following information:

I suggest having a look at our reference server implementation (https://github.com/magic-wormhole/magic-wormhole-mailbox-server) and then write down some details that are compatible with it. Open questions at the moment:

meejah commented 3 years ago

When does the server use the error field on the welcome message, and which values are common?

Whenever the program is started with --signal-error and the value will be "whatever the operator typed there".

piegamesde commented 3 years ago

I did a quick code dive and found some answers too:

piegamesde commented 3 years ago

So the most important question is: "should we restrict error messages to be in direct reply to client messages?".

If yes:

If no:

piegamesde commented 3 years ago

I just noticed that this has implications for our new permissions system too: what does it actually mean for the server to "error out"?

meejah commented 3 years ago

It makes sense to at least close on "permissions failed" I think, since those permissions are scoped the "the connection" so what else could you possibly do .. besides maybe try something else, i.e. a new permissions message, but also re-connecting doesn't seem like much of a burden there.

piegamesde commented 3 years ago

I propose the following changes and clarifications to the spec:

  1. Error messages that contain the orig field are in direct response to some other message, those that don't happened due to a different cause
  2. Error messages with an orig field SHOULD initiate a proper unhappy shutdown within the client. Communication can still continue normally, as if the error-triggering message had never been sent.
  3. Error messages without an orig field result in a direct tear-down of the communication on both sides (the server releases the nameplate and mailbox if necessary in behalf of the client). No further ordinary* communication may occur (* except WS pings etc.).
  4. Ack messages signify "the message got successfully processed". In accordance with point 2., no ack message is sent on an error
  5. A failure in submit-permissions results in an error message
  6. The error field in the welcome message gets deprecated. Instead, the server uses an error message as described in point 3.
  7. In case of 3., 5., or 6., all further attempts of communication are met with an error response.

Backwards compatibility:

What do you think?

meejah commented 3 years ago

I'm not sure why we should mess with the ack messages. In case you meant to describe current behavior in point 4 above, that's not what they currently mean. What they currently mean is, "the message arrived at the server" .. and are only used by the timing tool.

So, I don't know that we need to assign any semantic significance to them; indeed, it may collide with the goal of measuring timings (because we'd delay -- slightly -- when that message gets sent if the semantics become "processed correctly" rather than "received + parsed").

Several message types already allow for "error" to be sent as a response. They also have explicit "reply" messages for success (e.g. ->allocate versus <-allocated or ->claim vs. <-claimed). Given this, it might make more sense to spec particular responses to any message-types that currently lack an explicit response (where the client needs to make some sort of decision). Are there any messages like this?

The best way to analyze this is likely by tying it back to the state-machine(s); are there any places we can get "stuck" in the machine?


Regardless of that, part of this seems to be small enough to be its own proposal: that orig-less error messages be allowed; that Welcome[error] be deprecated; that an orig-less error message become the "new way" of doing a more-general error (like "server under maintenance"). Are there any other cases where an orig-less error might be required? (Partly I'm thinking maybe this points to making a special message for this case? like Unwelcome or something? .. might be cleaner than having to specially-handle every error depending on orig or not...)

piegamesde commented 3 years ago

Given this, it might make more sense to spec particular responses to any message-types that currently lack an explicit response (where the client needs to make some sort of decision).

I thought of this too recently, and it is indeed a viable solution if messing with acks is not acceptable.

Are there any messages like this?

Yes, the recently added submit-permissions message. That was an oversight on our part. And add and open kind of too (their response is a message, which is a bit annoying to correlate with the request. I'm not sure about open, since it returns a series of messages but AFAICT that series could also be empty and thus no response message).

The best way to analyze this is likely by tying it back to the state-machine(s); are there any places we can get "stuck" in the machine?

Not sure what you mean exactly, and which machine of which implementation you are referring to, but in either case the answer is a pretty confident yes :)

Are there any other cases where an orig-less error might be required?

The only one that I can think of is https://github.com/magic-wormhole/magic-wormhole-mailbox-server/issues/19

might be cleaner than having to specially-handle every error depending on orig or not

We can also make this a separate error type, FWIW


What are your thoughts of bumping the WebSockets end point version instead of bothering with backwards compatibility? I wrote down some ideas earlier today inspired by all of this that simplify a lot of other things. I still need to check how easy they would be to add to the current relay server implementation (gut feeling says it should be okay).