rethinkdb / horizon

Horizon is a realtime, open-source backend for JavaScript apps.
MIT License
6.78k stars 350 forks source link

Yet Another Permissions Proposal #199

Closed mlucy closed 8 years ago

mlucy commented 8 years ago

Here's a proposal that combines the templating and raw JS proposals into something that I think is pretty good. I'm going to skip over some of the edge cases to make the bones of it clearer, but most of the edge cases are resolved the same way as in @deontologician's proposal.

The basic idea is that every security rule consists of a group, a template, and an optional validator function (which can be either a write validator or a read validator). If the validator function is left off, it's basically the same as Josh's proposal (queries are legal if they match at least one template). If the validator functions are included, then the rule is that a query is legal if it matches at least one template and passes all the validator functions for templates that it matches.

So in the presence of validator functions we'll sometimes have to fall back to doing the retrying-atomic-writes thing. (@encryptio pointed out in a conversation that if we do the backoff properly the worst case for the retrying-atomic-writes thing is actually O(nlog(n)) instead of O(n^2), which makes it more palatable.)


Some examples:

// The simplest type of template looks like this.
{
  // Who the template applies to.  Defaults to anyone if omitted.
  group: "users",
  // What queries the template matches.  Defaults to `collection(ANYTHING)` if omitted.
  template: collection('messages').insert({from: USER.id}),
}

// The same rule expressed inefficiently with a validator.
{
  group: "users",
  template: collection('messages'),
  writeValidator: function(oldRow, newRow) {
    return oldRow == null && newRow.from == USER.id;
  }
}

// A rule that lets two users share a game state and pass ownership
// back and forth depending on whose turn it is.
{
  group: "users",
  template: collection('games').find(ANYTHING).update(ANYTHING),
  writeValidator: function(oldRow, newRow) {
    otherPlayer = oldRow.player1 == USER.id ? oldRow.player2 : oldRow.player1;
    return (oldRow.userTurn == USER.id
            && newRow.userTurn == otherPlayer
            && oldRow.moves == newRow.moves.slice(0, -1));
  }
}

// Another rule that lets a user start a game with any other player.
// Note that it *doesn't conflict* with the above rule, because the above
// rule is templated only on `update` queries.
{
  group: "users",
  template: collection('games').insert({player1: USER.id}),
  // In the case of `insert`s, we can just run the validator once rather
  // than having to do the backoff thing.
  writeValidator: function(oldRow, newRow) {
    return newRow.player2 != USER.id && newRow.moves == [];
  }
}

// A simple read template.
{
  group: "users",
  template: collection('messages').findAll({to: USER.id}),
}

// The same rule expressed inefficiently with a validator.
{
  group: "users",
  template: collection('messages'),
  readValidator: function(row) {
    return row.to == USER.id;
  }
}

// Here's a more complicated snapchat-like feature you would need read
// validation for.
{
  group: "users",
  template: collection('messages').findAll({to: USER.id})
  readValidator: function(row) {
    return row.timeStamp > (new Date()) - 1000*60*60*24;
  }
}

Here's what I like about this model:


Ping @coffeemug, @deontologician, @Tryneus, @dalanmiller, @mglukhovsky.

danielmewes commented 8 years ago

Awesome, I like this!

I assume group memberships will be managed as if they were a normal collection (e.g. collection('groups'))?

One question: You say that

a query is legal if it matches at least one template and passes all the validator functions for templates that it matches.

So if a query matches multiple templates, it needs to pass the validator functions for all of those templates? I wonder if it should be enough to pass the validator function of one matching template.

One argument in favor of making a query legal once it passes one pair of template and associated validator function is that it allows implementing any template in terms of a validator function instead. Imagine a system that relies heavily on validator functions, because the logic is too complex to be expressed in templates.

Your example above

// The same rule expressed inefficiently with a validator.

is only equivalent to the template variant, if passing one template+validator pair is sufficient. It is not equivalent to the template variant if the validators of all matching templates have to pass. (assuming other matching rules are present)

marshall007 commented 8 years ago

I like most parts of this proposal, but I still find the query templates difficult to mentally parse and I don't really see it as a convenient way to express the rules. I would prefer to just specify the collection name and read/write validators, perhaps with a simpler object notation for when the validator does not require complex logic. For example:

{
  group: "users",
  collection: "messages",
  read: {
    to: USER.id
  },
  write: {
    from: USER.id
  }
}

// Same rules expressed with function validators.
{
  group: "users",
  collection: "messages",
  readValidator: function(row) {
    return row.to == USER.id;
  },
  writeValidator: function(oldRow, newRow) {
    return (oldRow == null || oldRow.from == USER.id) && newRow.from == USER.id;
  }
}

If the query templates are useful internally for quickly validating queries, such templates could easily be built from the specified object-based read / write rules.

coffeemug commented 8 years ago

One option is to define a shorthand notation for common template definitions (e.g. a string could denote a collection) and still support full templates under the hood. This way when users get started they don't have to worry about the templating language, but it will be available to them for advanced use cases when they need it.

coffeemug commented 8 years ago

Also, another question -- can users make Horizon/RethinkDB API calls from within the validators?

dalanmiller commented 8 years ago

One problem there @marshall007 is that your examples show less granularity, how would you specify only being able to do find vs findAll?

I think it would also be beneficial where if you don't specify validators (at least I don't see how they could work with multiple templates at once). One should be able to provide an array of templates, or else I could imagine it getting tedious to list and maintain one object after another to get a certain group all the permissions it needs.

{
  group: "users",
  templates: [
      collection('messages').store({ from: USER.id }),
      collection('profiles').upsert({ id: USER.id }),
      collection('posts').upsert({ author: USER.id }) 
  ]      
}
marshall007 commented 8 years ago

@dalanmiller right, I see that as a plus actually. Unless I'm just missing something, I don't see a lot of value in distinguishing between find and findAll. I think permissions should be based on the structure of the document, not the query. By having query templates you get a cascading effect where it's tricky to figure out how rules interact with each other.

mlucy commented 8 years ago

I assume group memberships will be managed as if they were a normal collection (e.g. collection('groups'))?

I think you want to handle groups separately internally because you need them available on the USER object. I'm agnostic on whether it's better to manipulate groups by having an artificial collection that maps onto this internal representation, or to manipulate groups by just having new toplevel commands. (@deontologician, any thoughts?)

So if a query matches multiple templates, it needs to pass the validator functions for all of those templates? I wonder if it should be enough to pass the validator function of one matching template.

One argument in favor of making a query legal once it passes one pair of template and associated validator function is that it allows implementing any template in terms of a validator function instead. Imagine a system that relies heavily on validator functions, because the logic is too complex to be expressed in templates.

I'm not sure which is better. Some thoughts off the top of my head:

In favor of "it only validates if it passes all validator functions":

After thinking about it, I think your semantics (where it passes if there exists a template that matches and validates it) are slightly better, but I'm not really sure.

mlucy commented 8 years ago

I like most parts of this proposal, but I still find the query templates difficult to mentally parse and I don't really see it as a convenient way to express the rules. I would prefer to just specify the collection name and read/write validators, perhaps with a simpler object notation for when the validator does not require complex logic.

This is probably a preference thing. Templates seem very intuitive to me because the thing you write in the template looks almost exactly like the query you're writing in your client code, which is really nice. But I could see it being unnecessarily confusing instead. What do other people think? Do templates make it easier or harder to map in your head from what a security rule says to what operations it allows?

I think permissions should be based on the structure of the document, not the query

I'm not sure I agree with that. I think there are cases where validating queries instead of documents makes sense.

One example is the case where you want different rules for inserts than for replaces. (For example, if you want email-like semantics where users can send messages but not edit them once they're sent.) That's hard to do by only evaluating permissions on the structure of the document. (You could obviously do it with validators, but that will be much slower.)

It certainly wouldn't be the end of the world if we only allowed validating on the structure of documents, though.

If the query templates are useful internally for quickly validating queries, such templates could easily be built from the specified object-based read / write rules.

Templates are way better internally, so if we decide to omit them I think we should definitely make sure to support object-based read/write rules, and to compile them into templates rather than into validators.

mlucy commented 8 years ago

I think it would also be beneficial where if you don't specify validators (at least I don't see how they could work with multiple templates at once).

I don't think that would be a problem. In fact, we should probably let any of group, template, and validator be arrays, with the semantics that {group: [a, b], template: [c, d], validator: [e, f]} is just the Cartesian product of those constraints (so it's the same as writing {group: a, template: c, validator: e}, {group: a, template: c, validator: f}, {group: a, template: d, validator: e}, ...). I think this interpretation works and is useful regardless of whether we decide queries should have to pass all validators or some validator.

mlucy commented 8 years ago

One option is to define a shorthand notation for common template definitions (e.g. a string could denote a collection) and still support full templates under the hood. This way when users get started they don't have to worry about the templating language, but it will be available to them for advanced use cases when they need it.

I wouldn't necessarily be opposed to that, but I think writing {template: collection('foo')} instead of {collection: "foo"} probably isn't that bad, and it might not be worth introducing two notations for it.

mlucy commented 8 years ago

Also, another question -- can users make Horizon/RethinkDB API calls from within the validators?

Maybe? I don't see any fundamental reason we would need to forbid it, as long as we don't validate the API calls made inside of validators. Do you think this would be particularly valuable?

My instinct would be to ship without that and add it if lots of people complain and say they need the extra power.

dalanmiller commented 8 years ago

I don't think that would be a problem. In fact, we should probably let any of group, template, and validator be arrays, with the semantics that {group: [a, b], template: [c, d], validator: [e, f]} is just the Cartesian product of those constraints (so it's the same as writing {group: a, template: c, validator: e}, {group: a, template: c, validator: f}, {group: a, template: d, validator: e}, ...). I think this interpretation works and is useful regardless of whether we decide queries should have to pass all validators or some validator.

Actually, now that you say that, it doesn't seem too confusing, just as long as our messaging is explicitly clear that this is what we do. Although, it wouldn't be a true cartesian because you would separate out the read and write queries under template and apply the appropriate validators to them. À la: the product of all groups, to all read templates, to all read validators and likewise for all write templates and write validators.

yocontra commented 8 years ago

@mlucy Being able to do async work (including making rethink calls) inside of the validator is extremely important - all but the most basic cases would demand it.

You might find it interesting: ACL system for rethinkdb that allows you to specify strings of allowed roles, or an arbitrary async function (info here http://shasta.tools/palisade/docs/Rules.html)

marshall007 commented 8 years ago

Templates seem very intuitive to me because the thing you write in the template looks almost exactly like the query you're writing in your client code, which is really nice.


I'm not sure I agree with that. I think there are cases where validating queries instead of documents makes sense.

One example is the case where you want different rules for inserts than for replaces.

@mlucy I'm also interested in hearing other people's thoughts on query templates. My arguments here would be:

  1. I don't think specifying different rules for different queries is the common case.
  2. Given that many queries in Horizon are abstractions (like upsert), I think it is unclear how multiple query template rules interact. For example, would my insert rule apply to an upsert for a document that doesn't exist?
  3. The convenience of being able to drop in query templates based on they types of queries your app is using is a red herring. Since it's a single-page app, we have to assume the client could do anything. I feel like query templates encourage writing overly specific rules that may not capture all cases.

I would very much prefer explicitly specifying that a group can insert knowing that my rules apply regardless of the term used in the query. To accommodate different rules for different write operations, I would suggest supporting read, write, insert, replace, remove validators where write applies to insert, replace, remove unless overridden. Example:

{
  group: "users",
  collection: "messages",
  read: {
    to: USER.id
  },
  write: {
    from: USER.id
  },
  remove: {
    to: USER.id
  }
}
coffeemug commented 8 years ago

I'm agnostic on whether it's better to manipulate groups by having an artificial collection that maps onto this internal representation, or to manipulate groups by just having new toplevel commands.

I would prefer an artificial collection. I think it would be a much nicer UI.

After thinking about it, I think your semantics (where it passes if there exists a template that matches and validates it) are slightly better, but I'm not really sure.

I think these semantics (passing if there exists a template that matches and validates) is better. In practice I think you'd want to secure huge chunks of data (say, on a collection level -- this will probably happen by default) and then start opening up smaller bits of data to legal users. If we did it the other way (everything has to match) I think that setting up permissions will be extremely unintuitive to most users.

I wouldn't necessarily be opposed to that [porcelain template notation], but I think writing {template: collection('foo')} instead of {collection: "foo"} probably isn't that bad, and it might not be worth introducing two notations for it.

We should talk about what that notation might look like. For example, collection: 'foo' may not be better than template: collection('foo'), but template: 'foo' might be.

Also, another question -- can users make Horizon/RethinkDB API calls from within the validators?

Maybe? I don't see any fundamental reason we would need to forbid it, as long as we don't validate the API calls made inside of validators. Do you think this would be particularly valuable?

My instinct would be to ship without that and add it if lots of people complain and say they need the extra power.

I also think that it's extremely important and that the security system should eventually integrate/work together with the system that enables users to define custom commands/end-points. I don't know if we can ship in version one, but I think we can follow up with this pretty quickly after launch.

deontologician commented 8 years ago

As far as artificial collections vs. new toplevel commands, I am in favor of artificial. I think it's ok to take up the app's collection namespace. If they want to create a collection called "groups", they can just use "appGroups" or something more specific/descriptive.

I think allowing templates to be an array is a good idea, I think validators as an array is probably unnecessary since js is already more flexible than that. If we were specifying validating expressions (like firebase) rather than an entire function, it might make sense. I think groups as an array might be confusing, does the group apply when any group matches, or when all groups match?

My opinion is we should implement this as soon as possible for a few reasons:

  1. It has the benefit of a few rounds of "what about X?" style thinking, and it stands up pretty well. I think that's about as much as you can expect from a design space with so many possible answers.
  2. I don't have a good intuition for how cumbersome it will be in practice, so I think the best way to see that is to implement a few example apps with diverse permission requirements and get a feel for it. If it's totally unacceptable, we can rip it out and do it over, but my guess is that the work won't be massively wasted, we can just make tweaks. For example, if we implement this, and end up deciding we like @marshall007 's suggestions better, much of the code can be re-used.
  3. We should probably start wrapping up bike-shedding. There will be another release after this
niieani commented 8 years ago

Since these permission templates are so related to actual queries, it would be really cool if we could simply specify them as comments or decorators in the code of the client, at least in the dev-mode (since the developer is likely to be running both the server and the client on one machine, the horizon server could access the files of the client). For any developer who appreciates rapid iteration, it's simply much easier to have query-code and the permission-code side-by-side, instead of switching back and forth between the server project to the client project.

The Horizon server could then simply parse the client code in search of those special comments specifying permissions to generate the actual permissions code. I'd imagine server live-reload would make this a bliss: a dev simply works on the client, annotating the permissions as he writes the queries. For the production code there could be a way to manually output the permissions part of the server to a file for review.


On another note, @marshall007:

I see that as a plus actually. Unless I'm just missing something, I don't see a lot of value in distinguishing between find and findAll.

The value is that one can be abused more easily then the other. If a malicious group of users wanted to DDoS a service by flooding the DB with queries, it would be much easier for them to do so if they had permissions to findAll vs find.

However both of these points get mitigated if the permission templates could have an optional extra "rateLimit" property, that is, how many times a given template-based query can be executed in a specified amount of time, e.g.:

{
  group: "users",
  template: collection('messages').insert({from: USER.id}),
  // Allow only 3 inserts per minute,
  rateLimit: { limit: 3, interval: 60 }
}
mlucy commented 8 years ago

Being able to parse the permissions out of a comment above the actual query being permitted sounds pretty cool, actually. It would probably also reduce the burden of maintaining the permissions.

marshall007 commented 8 years ago

@niieani 👍 to specifying permission templates as ES7/TypeScript decorators or comments when your environment doesn't support them. I think decorators would play really nicely with the "models" concept discussed in #105.

niieani commented 8 years ago

@marshall007 I like decorators too, however the problem with them is that with the current spec they can only be applied inside classes, so they wouldn't work for codebases such as ones based on Cycle.js. Maybe the spec will still change though, but I wouldn't count on it for now.

Decorators are easier to retrieve, if all they did was added data to one static array - all the reader would need to do is require() all the files to get the decorator data.

On the other hand, comments are also pretty safe, cause you could always use one of the AST-generating parsers to extract the comment data, and find Horizon's permission comments, perhaps recognized by a specific signature, say a "horizon-permission" tag in a new line comment:

// <horizon-permission>
// {
//  group: "users",
//  template: collection('messages').insert({from: USER.id})
// }
// </horizon-permission>

vs decorators:

class AwesomeViewModel {
  @horizonServerPermission({
    group: "users",
    template: collection('messages').insert({from: USER.id})
  })
  addMessage() {
    // message adding logic...
  }
}

Another plus of comments is that they can be easily stripped for production code (so we don't accidentally reveal the actual permissions to the users). Stripping decorators would require another build step.


Any thoughts on the rate limiting addition?

marshall007 commented 8 years ago

@niieani I've only used decorators in TypeScript, but as far as I can tell decorators/annotations are supported on classes using babel v5 and can be re-enabled under v6 with babel-plugin-transform-decorators-legacy.

niieani commented 8 years ago

@marshall007 Indeed they are supported on classes, functions and properties, however it sill requires the dev to use prototypical JS, i.e. classes. Some people simply prefer pure JS / TS. :)

deontologician commented 8 years ago

I like rate limiting idea, I opened a topic on it in the discussion board here: https://discuss.horizon.io/t/adding-rate-limiting-to-permissions/77

niieani commented 8 years ago

Continuing the permissions-in-comments idea, Horizon could simply use custom jsdoc via one of the ready-to-use parsers like: https://www.npmjs.com/package/jsdoc-parse

pkamenarsky commented 8 years ago

Forgive me if this has already been proposed, but I'm just gonna leave a note here just in case.

I think the most robust solution would be incorporating permission checking into the database itself. Something like:

permissions: {
    users:
      { name:
        { read: ['others'],
          write: ['self']
        },
      , role:
        { read: ['others'],
          write: ['admin']
        }
      , balance:
        { read: ['self']
        , write: ['system']
        }
      }
    }

promotions: {
  admin: ['others'],
  system: ['admin'],
  others: [],
  self: ['others']
}

admin: ['others'] means that admin can be substituted for others in the permissions table.

ReQL would then have to be extended with the ability to provide a user id, like:

table('users').insert({..}).withUser(myId)

which would reach into users[user_id].role. If no user id is provided and the query attempts to read from or write to a protected field, the query fails.