Improve "welcome" message

piegamesde commented 3 years ago

It promotes a list of known relay servers to use
It gives a proof of work challenge to solve

The list of known relay servers allows us to not always use the same one, distributing both cost and availability. It also allows to reboot single relay servers for maintenance without bringing the whole service down. Each client should pick one or two, this will result in up to four relays being pinged during relay initialization (remember, there are two clients who both pick at random). Four connection attempts should be enough to give a sufficiently high probability of finding one that is up.

The proof of work challenge is added in the attempt to protect the rendezvous server from a DOS overload. It is a single point of failure, and we want to be prepared if some dumb-ass spams it with connection attempts for some reason. The challenge is made in a way that its difficulty is configurable by the server, thus it can dynamically adapt at the load situation. It is also designed in a way to be as light on the server resource as possible (also in order to not accidentally introduce a new DOS vector):

The server can use the first 8 bytes of the challenge as running counter, only that needs to be stored. A binary heap gives
All other required information are stored and forwarded by the client, a MAC protects against clients that make up their challenges.

meejah commented 3 years ago

Note that there is already an attempt + implementation at a "nameplate allocation now requires a token" (in the python mailbox server) .. I'd have to (further) remind myself where exactly that is at, but the idea was an open-ended way to demand more from clients. A "proof of work" (I'm assuming "like hashcash" here) token would of course fit in there.

meejah commented 3 years ago

Ahh, I think it was https://github.com/magic-wormhole/magic-wormhole/issues/126 ... I thought I had a branch that implemented most of that too but I can't immediately find it.

piegamesde commented 3 years ago

That sounds interesting, if you can dig out some details about it I'd be interested!

My current proposal for proof of work is some primitive brute-force task on a hash function. There are a lot of variations on the concept, this one is mostly optimized to minimize the server load. I'll try to investigate Verifiable Delay Functions. They obviously provide a lot more than what we need, but it'd be cool to find a one-way function that doesn't trivially parallelize. Maybe time lock puzzles or "proof of sequential work"?

Another thing that comes to my mind right now is that "nameplate allocation requires a token" is not enough to protect us – Claiming a mailbox directly is a pattern that will become more common when we have seeds. Nevermind, forget that, it should work fine.

meejah commented 3 years ago

https://github.com/magic-wormhole/magic-wormhole/issues/126#issuecomment-304058229 has some notes from a #magic-wormhole discussion between (at least) myself and warner.

piegamesde commented 3 years ago

I have seen that comment, but lacking the surrounding discussion context I don't fully understand the motivation for some of its aspects.

Most importantly, why have a separate abilities/permission round-trip if we can simply add fields onto the existing welcome/bind messages in a backwards-compatible way? Are there any issues with my proposal that I haven't thought of?

If we can't find the code, this will have to be re-implemented. But it shouldn't be too hard, it's a rather simple feature.

Also, any opinions on the relay discovery feature? (Maybe it was a bad idea to have both in one commit). I don't fully know which kinds of attributes one may want to advertise next to the URL (I could only think of "server location" for now). Also, the harder part – how the rendezvous server knows about 3rd-party relays – is not really part of the spec because clients can't be bothered. It however is something that needs to be figured out nevertheless.

meejah commented 3 years ago

I think the protocol in that comment anticipated a more open-eneded way to do things -- that is, it's not just for one style of proof-of-work. For example it could be used to do ZKAP "payments" or logins / proof-of-account or have different styles of proof-of-work.

So IIRC, it was a "change the overall protocol once" so that individual PoW etc schemes can be added to the "abilities" and "permission" messages more easily .. I definitely saw that code semi-recently. I will have some time this evening to dig around and find it for real.

piegamesde commented 3 years ago

Hm, what is the intended semantic of it, when multiple concurrent styles are supported? The client sends all those that it supports as abilities, and then the server picks one? If we let the client pick one that it supports I can make the scheme work with only one round-trip.

Furthermore, I'd like to call the "proof of account"/"proof of human" family as out of scope, as it is requires additional human interaction.

meejah commented 3 years ago

"proof-of-account" doesn't necessarily require additional human interaction. It certainly could require more action if it was e.g. "username + password"-based but a scheme could employ a keypair instead (for example). You're right that could be considered feature-creep on DoS .. but I do think it's worth considering (especially if it could fit in as a further, later enhancement to a DoS scheme).

After all, "in general" what we're talking about here is enhancing the protocol so that the server can ask for "something else" / more interaction from clients. Roughly speaking, an account system could be viewed as DoS / misuse prevention (e.g. for private deployments where any use outside your organization is unwanted).

As to relay-discovery I like the general idea .. but it's probably best expressed as its own enhancement, I think.

piegamesde commented 3 years ago

I see. I could make it that the server sends the challenge data for all of the possible types of POW/Captcha/Auth that it supports. The client then picks one and submits the answer. This does not add any new message types, and only half a roundtrip is added compared to previously.

The client can freely choose which challenge to do (or none at all, because backwards compatibility). The server can control which one it prefers by making the other ones more expensive (or not providing them).

meejah commented 3 years ago

I think I greatly prefer the "abilities" based interaction, for several reasons:

it accommodates old and new clients (at the same time)
it is open-ended, easily allowing future innovation around "permission to use this rendezvous"
several mitigation strategies can be supported / used at once
it is easier to change or add new methods

Apart from that, I think it would be better to start with a more well-known PoW like "hashcash" .. the scheme proposed here looks very similar to that. Perhaps "hashcash" isn't the right one to choose, but something with existing libraries / spec is what I'm thinking here :)

Thinking generally about the protocol and the "abilities"-based interaction, I think the biggest point is the "easier to add new ones" and "several supported at once". In general I'm thinking of this as "permission to use the server right now". Certainly one use-case is DoS mitigation but there are others, especially for non-public/free-to-use deployments.

Here's the idea:

clients tell the server what they support (including "nothing" by sending BIND before ABILITIES)
the server chooses the permission strategy (if any) it wishes to use for this client
the server tells the client which one it chose in WELCOME (that is, WELCOME would always have 0 or 1 permission strategies)
the client responds to the challenge (if any) in PERMISSION

Let's consider a case where the server supports three permissions models: hashcash, proving existence of an account or spending ZKAPs.

A client that supports just "accounts" or hashcash connects and sends ABILITIES:

{
    "hashcash": {},
    "account": {
        "type": "cryptosign",
        "public_key": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    }
}

The server chooses to use the "account" strategy, because that account does exist. It could instead have chosen "hashcash" but can't choose ZKAPs (because the client doesn't support that). It sends a "challenge" that the client must sign to prove it controls the corresponding private key. (Other similar methods could even use SCRAM or other password-methods that include human UX interaction). So it sends back WELCOME, like:

{
    ...
    "permission": {
        "account": {
            "type": "cryptosign",
            "nonce": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
        }
    }
    ...
}

The client would then use its private key to sign the nonce with the NaCL cryptosign signature scheme and send it up in the PERMISSION message. I've used somewhat generic names here; maybe calling the scheme account-cryptosign or similar would make more sense.

This gives server operators lots of flexibility. For example, maybe they are happy to offer a cost-free service for most users but have a "premium" offering for account-holders. Under high load or DoS or similar, they could demand hashcash from the free users but continue to allow account-holders immediate access. Or perhaps they decide to offer zero free service while under DoS.

Another scheme I could imagine .. especially for Web-based clients .. could be reCAPTCHs or similar.

Even more esoteric, an operator may decide to accept anonymous payment for service with a system like ZKAPs (https://leastauthority.com/product-development/zkaps/) or something else.

I think allowing multiple schemes wrapped inside fairly generic protocol messages like this makes it a LOT easier for implementations to experiment with different permissions strategies. Since "DoS mitigation" is essentially an arms-race I think this is very important for this use-case. By "arms race" here I mean that a motivated "bad actor" can usually get around various mitigation strategies (depending on their motivation and funding). As a nice addition it also captures use-cases that aren't strictly "DoS related" but generally relate to "permission to use this service right now" -- I kind of see DoS as a particular special case of that.

Perhaps if @warner has time / motivation he would be interested in chiming in ..?

meejah commented 3 years ago

Note also that the proposed "abilities and then welcome" dance could be used to note support for Dilation or Seeds and other future protocol enhancements.

piegamesde commented 3 years ago

Note also that the proposed "abilities and then welcome" dance could be used to note support for Dilation or Seeds and other future protocol enhancements.

Uhm, I think you are confusing things here. The abilities negotiation with the server and with the other client are two distinct ones, for two different protocols with different features.

Regarding your other comment, I must admit that I'm not a huge fan of your use cases*, but I'll have a more in-depth look some time later.

* The problem with hosting custom rendezvous servers is that both sides need to agree on a rendezvous server in order to find each other. And I still haven't found a solution with sufficiently good UX that self-hosting one would be worth it.

meejah commented 3 years ago

Yes, you're right we already have a way to do seeds etc stuff. So, ignore that :)

"Self-hosting", maybe not?

But I'm getting at larger deployments or commercial offerings etc. I'm not necessarily strongly committed to any of those particular use-cases, but I do think that if there's a need to do DoS-mitigation then there's going to be a need to change the DoS mitigation strategies as the people doing DoS change tactics.

Basically anything that currently has its own AppID could instead use a whole separate deployment. Obviously, such a deployment would need to "burn in" or otherwise communicate the URL of the rendezvous service -- like is already done with the wormhole CLI.

There are certain advantages to having "one" such server .. but also disadvantages (such as "what if warner gets bored of maintaining it").

piegamesde commented 3 years ago

Thanks, this is convincing.

meejah commented 3 years ago

https://github.com/magic-wormhole/magic-wormhole-protocols/pull/12 covers the proof-of-work parts of this proposal .. but I think the "list transit relays" piece is still interesting and useful; perhaps this could be trimmed down to just that?

piegamesde commented 3 years ago

Yes, I have not forgotten about this. My plan is to wait for #12 merged, and then rebase on top of that with the PoW changes taken out.

piegamesde commented 3 years ago

@meejah I've rebased and adapted to the latest changes. There are still a few open questions to resolve, but please have a look at it first.

piegamesde commented 3 years ago

@meejah I've rebased and adapted to the latest changes. There are still a few open questions to resolve, but please have a look at it first.

piegamesde commented 3 years ago

One other question that just came up: What's the purpose of the error field? Where is it actually used, and what for? I think its main purpose got superseded by the permission-required field. Thus, I propose to deprecated it, and let clients ignore it. If the server wants to tell there's an error, it should use the error inbound message instead.

What do you think?

meejah commented 3 years ago

I believe the error field is for things like "This server is under maintenance, please try again". But it also has some "speculation" about CAPTCHAs etc in the text, so at least that part is superseded by `permission-required .. so I think it still has a purpose ("client should exit after displaying the message") but more narrow than previously anticipated..?

piegamesde commented 3 years ago

In order to not derail this thread too much, I opened https://github.com/magic-wormhole/magic-wormhole-protocols/issues/15 about error handling instead.

magic-wormhole / magic-wormhole-protocols

Improve "welcome" message #6