RFC: Lightweight Self-Signup

phritz commented 3 years ago

Proposal below: https://github.com/rocicorp/repc/issues/269#issuecomment-731446716

phritz commented 3 years ago

issue on the diff-server side: https://github.com/rocicorp/diff-server/issues/64

phritz commented 3 years ago

Background

Problem

Since we don't yet have an accounts service the way a customer "signs up" for replicache is manually. They contact us and we create a PR adding a new id and client view url to the accounts list in the diffserver. This is less than ideal because:

customer can't get started without one of us taking an action, and latency of this step is a barrier to entry
customer can't kick the tires without involving a human, another barrier to entry: they may have good reasons to not want to be talking to a human at this stage (don't want to talk about what they're working on, afraid of a sales pitch, etc)

So the problem to solve is this: enable a customer to be able to start using the diffserver without involving a human and with the fewest number of and easiest steps possible. We need to do this in a secure-enough way wrt the threat model below.

Non-goals

explicitly not a long-term or large-scale solution
does not include agreeing to an EULA, providing billing info, etc.
no notion of multiple users, orgs, etc.

Threat model

There is additional info in the auth link above but the threat model for now is:

if we create an open proxy in the diffserver bad guys can use it spam (and at reasonable scale, vercel will happily scale us up)
data layer user auth credentials are sent along with batch push and client view requests so if a bad guy can set either url for an account they can steal user auth credentials by pointing the url to a server they control. We do not want to enable the privilege escalation of 'redirect a request from or overwrite a url string in webapp' => 'steal credentials'.
a bad actor with an account could spam us (increasing costs at various levels, clogging logs, DOSing other users, etc), so we need a way to turn accounts off if need be and ideally identify in some way who is behind it
a bad actor might try to fetch a user client view from an account they don't own (our approach to prevent this is that clientids are unguessable and should be treated like secrets)

Out of scope, at least for now:

customers posing as other customers
preventing bad actors getting an account in the first place
protecting the number of or identity of other account holders (? should id be in scope?)
mitigating damage from a leaked clientid

Proposal: Open auto-signup with guardrails

Let's err on the side of permissiveness for now but have some backstops, or easy potential backstops, in case of problems. Our strategy for now is to be able to easily mitigate problems if they happen, as opposed to preventing them from happening up front. Also, let's not call into existence new dependencies or infrastructure for the moment. And importantly, lets enable new customers to make different security tradeoffs in the name of convenience than established customers in production.

Let's allocate a range of Auto-Signup account IDs (aka "asid"s), say ids >= 1,000,000. We keep track of the next available auto-signup id in noms on the diffserver. There's a URL served by the diffserver say /auto-signup that when requested returns the next available asid and maybe some short instructions, and increments the next available id now that this asid is used. It should probably mention something about limitations, eg "This account can be used to get up and running with or to evaluate Replicache. Before deploying replicache to end users in production for non-evaluation purposes you will want to contact us at xxx so we can upgrade the account for production use. Otherwise, it will be subjected to certain restrictions eg a limit on the number of clients." We should try to get an optional email address with auto signup so we can contact them if we notice them exceeding reasonable limits or having other kinds of problems.

Asids are necessarily less trustworthy than regular account ids that we add manually to accounts.go. They are allocated in a separate range so we can if need be easily apply different policies to them than to regular accounts. Externally, to customers, they are just accounts -- we probably don't make a distinction, we just say that don't go into production for real without contacting us. When they do want to go into production for real we manually add a regular account for them to accounts.go and say "start sending this new id instead of the other one". We can set their clientview urls up for them in the new account so everything works just like before, except they are sending a new account id. There is no loss or reset of client data required.

Limits on accounts

Like today, there are no limits on what regular accounts can do. We require that they whitelist their client view url primarily so as to eliminate the risk of leaking user auth to random urls, and also to not create an open proxy. (We should let them specify multiple urls, btw.)

We can start with only some gross limits on asids:

rate limit issuance of asids (eg, issue at most 1 per minute). This limits the breadth of spamming by limiting the rate at which asids can be created.
auto-add but limit client view urls: don't require that they specify clientview urls up front. instead, enable customer to specify clientview url in the pullrequest via the beginsyncrequest. Keep track in noms of which clientview urls the asid has fetched from and allow them to fetch from up to N (say 4). If they try to fetch from a (N+1)th url error out and tell them to contact us. This balances ease of setup with limiting the breadth of potential spamming. Asids are less protected against leaking user auth than regular accounts because the diffserver will fetch previously unseen urls up to the limit. (Regular accounts would specify clientview url in the pullrequest as well, and we'd check not against a list in noms but against the list in accounts.go.)

I think that's probably all we start with. Should we need to, we can easily turn off asid diffserver access on an individual or complete basis by just checking account id sent in the pullrequest, without affecting regular users. We have a bunch of additional policies that we could implement should the need arise. These policies could just be alerts to us via logging an ERROR or could actually be enforced:

rate limit or cap number of pull requests
limit the number of clientids
require a preflight CORS check on new clientview urls to prevent spamming unowned domains
limit the lifetime of an asid
etc

Bummers

we don't get regular customer management of clientview urls from this proposal. But then, that's not a primary goal. If we wanted to we could use one of the alternative strategies mentioned below to enable self-service for regular customers.
if we lose the noms db that stores asids, everyone using one is SOL. This is a departure from present in that we can safely lose noms and restart fresh with no loss of data for customers (of course, there is the downtime).
we don't really have a way to get them to agree to an EULA, if that's important

Rough TODOs

pass clientview url in the pullrequest
implement /auto-signup which keeps a nextid in noms
keep track in noms of clientview urls fetched by asid, limiting it to N
have a way to signal account errors back to the client, probably one that isn't great but works now, more grist for the https://github.com/rocicorp/repc/issues/119 mill
update getting started docs
update run-your-own-diffserver instructions to not use asids (i think)
expand accounts.go clientviewurl to allow more than one url

Alternatives

variation on the proposal above: provide a script they can run from the command line that fetches an asid and allows them to set clientview urls. I'm open to this but it's more moving parts and more work for the customer it seems.
i considered implementing more first-class account management in the diffserver by moving pieces over from the account service, eg github auth. However the account service assumes the existence of a mysql database so we'd either have to call one into existence or adapt it for noms. Looks like more work than we want to do.
i considered separating account info to a separate repo and having a little bot that you could submit your github handle to that would invite you as a write collaborator to that repo. Then they could fill out a template change which would be auto-submitted. We'd then fetch account info periodically from the diffserver, or push it to the diffserver. This seems like more work for the customer so I wasn't thrilled with it.
i considered enabling customers to auto-submit to accounts.go. Since we don't want to give these collaborators write permissions to diff-server the customer would need to fork diffserver, add an account, and submit a PR. We could use something like git-auto-commit to auto-submit, but all these autocommit actions seem to have the limitation that they don't work out of the box for forks, which seems like it might be a by-design constraint of github. The customer has to change a setting on their fork to enable the auto-submit. This seems like more work than we want.

aboodman commented 3 years ago

Thank you for putting this proposal together. I very much like the developer experience and scale of effort. Both make sense for where we are right now.

Comments:

implement /auto-signup which keeps a nextid in noms

If we store in Noms, then we need a special Noms db just to store this one integer. Because currently each account in diff-server has its own Noms database. This does not seem better than just storing this int directly in dynamodb.

if we lose the noms db that stores asids, everyone using one is SOL. This is a departure from present in that we can safely lose noms and restart fresh with no loss of data for customers (of course, there is the downtime).

We could turn on backups of the relevant s3 bucket. However that would also backup all the random diff server data that we don't want to backup.

Again, maybe better to put this in dynamodb. Then we can backup just the accounts.

we don't really have a way to get them to agree to an EULA, if that's important

Don't care about this, but it would be nice to get their name, email, and website. If we want to store this in Noms, we'd have to create a special "meta" dataset within each account db. I guess that's nbd, but this seems like another reason to just use dynamodb.

implement /auto-signup which keeps a nextid in noms

Nit: Can we call it just "signup". People don't need to know it's temporary or weird.

have a way to signal account errors back to the client, probably one that isn't great but works now, more grist for the #119 mill

Errors from /auto-signup? I'm imaging this is an HTML form. Were you imagining something you call programmatically?

phritz commented 3 years ago

implement /auto-signup which keeps a nextid in noms If we store in Noms, then we need a special Noms db just to store this one integer. Because currently each account in diff-server has its own Noms database. This does not seem better than just storing this int directly in dynamodb.

The advantage I was seeing is that is that the diffserver already directly uses noms, but does not directly use dynamo (the dependency is not visible to diffserver application code). So if we store it in noms we would introduce no new concepts into the diffserver, which has an appealing uniformity and simplicity to me.

if we lose the noms db that stores asids, everyone using one is SOL. This is a departure from present in that we can safely lose noms and restart fresh with no loss of data for customers (of course, there is the downtime). We could turn on backups of the relevant s3 bucket. However that would also backup all the random diff server data that we don't want to backup. Again, maybe better to put this in dynamodb. Then we can backup just the accounts.

Yeah that makes sense. If we can't specify a noms db for account info in a way that backs up only the relevant dynamo and s3 data w/o also backing up replicache client datasets, then yeah, seems better to put it into dynamo.

Nit: Can we call it just "signup"

Yes, from the user's perspective there should be no distinction, it should just be "signup" for an "account".

have a way to signal account errors back to the client, probably one that isn't great but works now, more grist for the #119 mill Errors from /auto-signup? I'm imaging this is an HTML form. Were you imagining something you call programmatically?

I meant signaling account errors from the diffserver back to the client. Like if we turn them off for some reason eg, they sign up 10,000 clients and don't respond to email, we want to signal that their replicache account is disabled and to contact us. We want to do this differently than 400 or 401. For now we're probably stuck with choosing a specific status code, dunno maybe 402, and having code that logs an ERROR when received.

aboodman commented 3 years ago

On Wed, Dec 2, 2020 at 8:16 AM Phritz notifications@github.com wrote:

implement /auto-signup which keeps a nextid in noms If we store in Noms, then we need a special Noms db just to store this one integer. Because currently each account in diff-server has its own Noms database. This does not seem better than just storing this int directly in dynamodb.

The advantage I was seeing is that is that the diffserver already directly uses noms, but does not directly use dynamo (the dependency is not visible to diffserver application code). So if we store it in noms we would introduce no new concepts into the diffserver, which has an appealing uniformity and simplicity to me.

Understood. That's appealing to me. It's kind of funny to prefer an unproven db that is used in only a handful of places to dynamodb, but hey ... it's our db.

if we lose the noms db that stores asids, everyone using one is SOL. This is a departure from present in that we can safely lose noms and restart fresh with no loss of data for customers (of course, there is the downtime). We could turn on backups of the relevant s3 bucket. However that would also backup all the random diff server data that we don't want to backup. Again, maybe better to put this in dynamodb. Then we can backup just the accounts.

Yeah that makes sense. If we can't specify a noms db for account info in a way that backs up only the relevant dynamo and s3 data w/o also backing up replicache client datasets, then yeah, seems better to put it into dynamo.

Wait a second, back up I don't think this chain of logic makes sense.

In your original proposal the only thing we're storing is a single "next_asid" int, right?

That has to be in a separate noms db, because the only noms dbs we currently have in diff server are per-account. So (a) we can back that up, but (b) do we even care? If we lose it it seems like what happens is maybe we have to figure out what the last one we gave out is or something. Not fatal.

If we want to store extra bits of information, like email address, then obviously putting them in the per-account noms dbs would make sense, but that would mean backing those up which we don't want to do.

So if we want to use Noms what we can do is create a new "accounts" noms db and store the global next_asid counter there and also per-account metadata we collect.

I can't decide the tradeoff between using Noms vs just using dynamodb directly. It's fun to use Noms. I feel like you're going to start complaining about the Noms marshalling API which is admittedly got some sharp edges. On the other hand dyanmodb API is also kind of grotty.

I'll let you make this call.

Nit: Can we call it just "signup"

Yes, from the user's perspective there should be no distinction, it should just be "signup" for an "account".

have a way to signal account errors back to the client, probably one that isn't great but works now, more grist for the #119 https://github.com/rocicorp/repc/issues/119 mill Errors from /auto-signup? I'm imaging this is an HTML form. Were you imagining something you call programmatically?

I meant signaling account errors from the diffserver back to the client. Like if we turn them off for some reason eg, they sign up 10,000 clients and don't respond to email, we want to signal that their replicache account is disabled and to contact us. We want to do this differently than 400 or

For now we're probably stuck with choosing a specific status code, dunno maybe 402, and having code that logs an ERROR when received.

Ah, SG.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rocicorp/repc/issues/269#issuecomment-737406387, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATUBCJBRBKBFJWO3GQDOTSS2AA5ANCNFSM4TVDGDUQ .

phritz commented 3 years ago

So if we want to use Noms what we can do is create a new "accounts" noms db and store the global next_asid counter there and also per-account metadata we collect.

Yes, this is what I had in mind if we use noms. Same page.

So (a) we can back that up, but (b) do we even care?

If we lose it easy enough to recover the next asid from the logs but we do lose all information about who has which account. But maybe we don't care for now.

I can't decide the tradeoff between using Noms vs just using dynamodb directly. It's fun to use Noms. I feel like you're going to start complaining about the Noms marshalling API which is admittedly got some sharp edges. On the other hand dyanmodb API is also kind of grotty.

I'm leaning towards the grotty I know vs the grotty I'd have to learn. Will follow up in slack.

phritz commented 3 years ago

Taking a look at this work relative after holiday break, looking to cut scope so proposing that we break remaining work down into DO / DO NOT. @aboodman

Propose we DO:

change account client view whitelist from specific URLs to hosts
add logging
ask arv/aaron to make the signup page and its result purty
improve signup page and result page copy
update docs and instructions to reference using signup

Propose we DO NOT:

do any form validation and don't redirect back to the original signup form on error; if there is an error we show it to them and they can hit back and refill the form
automatically retry on noms collisions. Retry can happen when submitting signup form or auto-adding a new url. for now, there is such low usage let's just not worry about it.
cache account records. let's just be inefficient (re-read all accounts every request) for now.
rate limit new account creation. if this is a problem it is easy to address then.

phritz commented 3 years ago

OK here's what's left. Items in each step can be done in pretty much any order. Two main changes that need to roll out are 1) updating documentation to refer to our new signup page and 2) sending clientViewURL in the PullRequest (JS -> repc -> diffs).

Step 1: finish up

[ ] @aboodman make signup pretty or tell me that the current interface (https://diff-server.rocicorp.now.sh/signup) is good enough or tell me to try to make it look like replicache.dev on my own. The three pages: get, post_success, and post_failure are here: https://github.com/rocicorp/diff-server/tree/main/serve/signup
[ ] @aboodman take a pass at the copy on the three pages above and make changes you see fit or give me thumbs up as is. eg Q: do we need to get them to agree to licensing ahead of time? Link to it more prominently?
[x] @aboodman set up support@replicache.dev
[x] @phritz add clientViewURL to repc PullRequest and send Version 2 to diffs if not present, Version 3 if present; then release repc; repc will be in a forwards and backwards compatible state at this point
[x] @arv make js changes to send clientViewURL to repc. Do not release js yet.
- [x] https://github.com/rocicorp/replicache-sdk-js/pull/258
- [x] @phritz ensure clientViewURL doc change gets included in js doc update
[x] @phritz prep changes to documentation to merge when we release signup changes to js

Step 2: release

[x] @phritz set the "real" name for account database in diffs, effectively wiping auto-signup accounts created to this point; then:
- [x] @phritz do a diffs release (any time after aboodman approves copy and appearance of signup pages)
[x] @arv do a js release
[x] @phritz merge the changes to documentation prepped above

Step 3: follow-up

[ ] @aboodman contact existing customers and ask them to upgrade to new js release at their convenience
[ ] @phritz once customers have upgraded, deprecate PullRequest Version 2 in repc and diffs
[x] @arv Update samples in JS repo to use new npm version

Things we are NOT doing:

retry on noms optimistic lock failure for POST and auto-add of clientview hosts
cache account.Records
rate limiting account signup

phritz commented 3 years ago

No longer relevant w/o the diffserver.

aboodman commented 3 years ago

🍺 Pouring one out.

On Mon, Mar 8, 2021 at 8:58 AM Phritz notifications@github.com wrote:

No longer relevant w/o the diffserver.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rocicorp/repc/issues/269#issuecomment-792992874, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATUBHJ3P7HGIG5CDFXQMDTCUM3LANCNFSM4TVDGDUQ .

rocicorp / repc