Closed mlucy closed 8 years ago
Awesome, I like this!
I assume group
memberships will be managed as if they were a normal collection (e.g. collection('groups')
)?
One question: You say that
a query is legal if it matches at least one template and passes all the validator functions for templates that it matches.
So if a query matches multiple templates, it needs to pass the validator functions for all of those templates? I wonder if it should be enough to pass the validator function of one matching template.
One argument in favor of making a query legal once it passes one pair of template and associated validator function is that it allows implementing any template in terms of a validator function instead. Imagine a system that relies heavily on validator functions, because the logic is too complex to be expressed in templates.
Your example above
// The same rule expressed inefficiently with a validator.
is only equivalent to the template variant, if passing one template+validator pair is sufficient. It is not equivalent to the template variant if the validators of all matching templates have to pass. (assuming other matching rules are present)
I like most parts of this proposal, but I still find the query templates difficult to mentally parse and I don't really see it as a convenient way to express the rules. I would prefer to just specify the collection name and read/write validators, perhaps with a simpler object notation for when the validator does not require complex logic. For example:
{
group: "users",
collection: "messages",
read: {
to: USER.id
},
write: {
from: USER.id
}
}
// Same rules expressed with function validators.
{
group: "users",
collection: "messages",
readValidator: function(row) {
return row.to == USER.id;
},
writeValidator: function(oldRow, newRow) {
return (oldRow == null || oldRow.from == USER.id) && newRow.from == USER.id;
}
}
If the query templates are useful internally for quickly validating queries, such templates could easily be built from the specified object-based read
/ write
rules.
One option is to define a shorthand notation for common template definitions (e.g. a string could denote a collection) and still support full templates under the hood. This way when users get started they don't have to worry about the templating language, but it will be available to them for advanced use cases when they need it.
Also, another question -- can users make Horizon/RethinkDB API calls from within the validators?
One problem there @marshall007 is that your examples show less granularity, how would you specify only being able to do find
vs findAll
?
I think it would also be beneficial where if you don't specify validators (at least I don't see how they could work with multiple templates at once). One should be able to provide an array of templates, or else I could imagine it getting tedious to list and maintain one object after another to get a certain group all the permissions it needs.
{
group: "users",
templates: [
collection('messages').store({ from: USER.id }),
collection('profiles').upsert({ id: USER.id }),
collection('posts').upsert({ author: USER.id })
]
}
@dalanmiller right, I see that as a plus actually. Unless I'm just missing something, I don't see a lot of value in distinguishing between find
and findAll
. I think permissions should be based on the structure of the document, not the query. By having query templates you get a cascading effect where it's tricky to figure out how rules interact with each other.
I assume group memberships will be managed as if they were a normal collection (e.g.
collection('groups')
)?
I think you want to handle groups separately internally because you need them available on the USER
object. I'm agnostic on whether it's better to manipulate groups by having an artificial collection that maps onto this internal representation, or to manipulate groups by just having new toplevel commands. (@deontologician, any thoughts?)
So if a query matches multiple templates, it needs to pass the validator functions for all of those templates? I wonder if it should be enough to pass the validator function of one matching template.
One argument in favor of making a query legal once it passes one pair of template and associated validator function is that it allows implementing any template in terms of a validator function instead. Imagine a system that relies heavily on validator functions, because the logic is too complex to be expressed in templates.
I'm not sure which is better. Some thoughts off the top of my head:
In favor of "it only validates if it passes all validator functions":
{readValidator: function(row) { return !row.hidden; }}
that essentially restrict all templates at once.
After thinking about it, I think your semantics (where it passes if there exists a template that matches and validates it) are slightly better, but I'm not really sure.
I like most parts of this proposal, but I still find the query templates difficult to mentally parse and I don't really see it as a convenient way to express the rules. I would prefer to just specify the collection name and read/write validators, perhaps with a simpler object notation for when the validator does not require complex logic.
This is probably a preference thing. Templates seem very intuitive to me because the thing you write in the template looks almost exactly like the query you're writing in your client code, which is really nice. But I could see it being unnecessarily confusing instead. What do other people think? Do templates make it easier or harder to map in your head from what a security rule says to what operations it allows?
I think permissions should be based on the structure of the document, not the query
I'm not sure I agree with that. I think there are cases where validating queries instead of documents makes sense.
One example is the case where you want different rules for inserts than for replaces. (For example, if you want email-like semantics where users can send messages but not edit them once they're sent.) That's hard to do by only evaluating permissions on the structure of the document. (You could obviously do it with validators, but that will be much slower.)
It certainly wouldn't be the end of the world if we only allowed validating on the structure of documents, though.
If the query templates are useful internally for quickly validating queries, such templates could easily be built from the specified object-based read / write rules.
Templates are way better internally, so if we decide to omit them I think we should definitely make sure to support object-based read/write rules, and to compile them into templates rather than into validators.
I think it would also be beneficial where if you don't specify validators (at least I don't see how they could work with multiple templates at once).
I don't think that would be a problem. In fact, we should probably let any of group
, template
, and validator
be arrays, with the semantics that {group: [a, b], template: [c, d], validator: [e, f]}
is just the Cartesian product of those constraints (so it's the same as writing {group: a, template: c, validator: e}, {group: a, template: c, validator: f}, {group: a, template: d, validator: e}, ...
). I think this interpretation works and is useful regardless of whether we decide queries should have to pass all validators or some validator.
One option is to define a shorthand notation for common template definitions (e.g. a string could denote a collection) and still support full templates under the hood. This way when users get started they don't have to worry about the templating language, but it will be available to them for advanced use cases when they need it.
I wouldn't necessarily be opposed to that, but I think writing {template: collection('foo')}
instead of {collection: "foo"}
probably isn't that bad, and it might not be worth introducing two notations for it.
Also, another question -- can users make Horizon/RethinkDB API calls from within the validators?
Maybe? I don't see any fundamental reason we would need to forbid it, as long as we don't validate the API calls made inside of validators. Do you think this would be particularly valuable?
My instinct would be to ship without that and add it if lots of people complain and say they need the extra power.
I don't think that would be a problem. In fact, we should probably let any of group, template, and validator be arrays, with the semantics that {group: [a, b], template: [c, d], validator: [e, f]} is just the Cartesian product of those constraints (so it's the same as writing {group: a, template: c, validator: e}, {group: a, template: c, validator: f}, {group: a, template: d, validator: e}, ...). I think this interpretation works and is useful regardless of whether we decide queries should have to pass all validators or some validator.
Actually, now that you say that, it doesn't seem too confusing, just as long as our messaging is explicitly clear that this is what we do. Although, it wouldn't be a true cartesian because you would separate out the read and write queries under template
and apply the appropriate validators to them. À la: the product of all groups, to all read templates, to all read validators and likewise for all write templates and write validators.
@mlucy Being able to do async work (including making rethink calls) inside of the validator is extremely important - all but the most basic cases would demand it.
You might find it interesting: ACL system for rethinkdb that allows you to specify strings of allowed roles, or an arbitrary async function (info here http://shasta.tools/palisade/docs/Rules.html)
Templates seem very intuitive to me because the thing you write in the template looks almost exactly like the query you're writing in your client code, which is really nice.
I'm not sure I agree with that. I think there are cases where validating queries instead of documents makes sense.
One example is the case where you want different rules for inserts than for replaces.
@mlucy I'm also interested in hearing other people's thoughts on query templates. My arguments here would be:
upsert
), I think it is unclear how multiple query template rules interact. For example, would my insert
rule apply to an upsert
for a document that doesn't exist?I would very much prefer explicitly specifying that a group can insert
knowing that my rules apply regardless of the term used in the query. To accommodate different rules for different write operations, I would suggest supporting read, write, insert, replace, remove
validators where write
applies to insert, replace, remove
unless overridden. Example:
{
group: "users",
collection: "messages",
read: {
to: USER.id
},
write: {
from: USER.id
},
remove: {
to: USER.id
}
}
I'm agnostic on whether it's better to manipulate groups by having an artificial collection that maps onto this internal representation, or to manipulate groups by just having new toplevel commands.
I would prefer an artificial collection. I think it would be a much nicer UI.
After thinking about it, I think your semantics (where it passes if there exists a template that matches and validates it) are slightly better, but I'm not really sure.
I think these semantics (passing if there exists a template that matches and validates) is better. In practice I think you'd want to secure huge chunks of data (say, on a collection level -- this will probably happen by default) and then start opening up smaller bits of data to legal users. If we did it the other way (everything has to match) I think that setting up permissions will be extremely unintuitive to most users.
I wouldn't necessarily be opposed to that [porcelain template notation], but I think writing
{template: collection('foo')}
instead of{collection: "foo"}
probably isn't that bad, and it might not be worth introducing two notations for it.
We should talk about what that notation might look like. For example, collection: 'foo'
may not be better than template: collection('foo')
, but template: 'foo'
might be.
Also, another question -- can users make Horizon/RethinkDB API calls from within the validators?
Maybe? I don't see any fundamental reason we would need to forbid it, as long as we don't validate the API calls made inside of validators. Do you think this would be particularly valuable?
My instinct would be to ship without that and add it if lots of people complain and say they need the extra power.
I also think that it's extremely important and that the security system should eventually integrate/work together with the system that enables users to define custom commands/end-points. I don't know if we can ship in version one, but I think we can follow up with this pretty quickly after launch.
As far as artificial collections vs. new toplevel commands, I am in favor of artificial. I think it's ok to take up the app's collection namespace. If they want to create a collection called "groups", they can just use "appGroups" or something more specific/descriptive.
I think allowing templates to be an array is a good idea, I think validators as an array is probably unnecessary since js is already more flexible than that. If we were specifying validating expressions (like firebase) rather than an entire function, it might make sense. I think groups as an array might be confusing, does the group apply when any group matches, or when all groups match?
My opinion is we should implement this as soon as possible for a few reasons:
Since these permission templates are so related to actual queries, it would be really cool if we could simply specify them as comments or decorators in the code of the client, at least in the dev-mode (since the developer is likely to be running both the server and the client on one machine, the horizon server could access the files of the client). For any developer who appreciates rapid iteration, it's simply much easier to have query-code and the permission-code side-by-side, instead of switching back and forth between the server project to the client project.
The Horizon server could then simply parse the client code in search of those special comments specifying permissions to generate the actual permissions code. I'd imagine server live-reload would make this a bliss: a dev simply works on the client, annotating the permissions as he writes the queries. For the production code there could be a way to manually output the permissions part of the server to a file for review.
On another note, @marshall007:
I see that as a plus actually. Unless I'm just missing something, I don't see a lot of value in distinguishing between
find
andfindAll
.
The value is that one can be abused more easily then the other. If a malicious group of users wanted to DDoS a service by flooding the DB with queries, it would be much easier for them to do so if they had permissions to findAll
vs find
.
However both of these points get mitigated if the permission templates could have an optional extra "rateLimit" property, that is, how many times a given template-based query can be executed in a specified amount of time, e.g.:
{
group: "users",
template: collection('messages').insert({from: USER.id}),
// Allow only 3 inserts per minute,
rateLimit: { limit: 3, interval: 60 }
}
Being able to parse the permissions out of a comment above the actual query being permitted sounds pretty cool, actually. It would probably also reduce the burden of maintaining the permissions.
@niieani 👍 to specifying permission templates as ES7/TypeScript decorators or comments when your environment doesn't support them. I think decorators would play really nicely with the "models" concept discussed in #105.
@marshall007 I like decorators too, however the problem with them is that with the current spec they can only be applied inside classes, so they wouldn't work for codebases such as ones based on Cycle.js. Maybe the spec will still change though, but I wouldn't count on it for now.
Decorators are easier to retrieve, if all they did was added data to one static array - all the reader would need to do is require() all the files to get the decorator data.
On the other hand, comments are also pretty safe, cause you could always use one of the AST-generating parsers to extract the comment data, and find Horizon's permission comments, perhaps recognized by a specific signature, say a "horizon-permission" tag in a new line comment:
// <horizon-permission>
// {
// group: "users",
// template: collection('messages').insert({from: USER.id})
// }
// </horizon-permission>
vs decorators:
class AwesomeViewModel {
@horizonServerPermission({
group: "users",
template: collection('messages').insert({from: USER.id})
})
addMessage() {
// message adding logic...
}
}
Another plus of comments is that they can be easily stripped for production code (so we don't accidentally reveal the actual permissions to the users). Stripping decorators would require another build step.
Any thoughts on the rate limiting addition?
@niieani I've only used decorators in TypeScript, but as far as I can tell decorators/annotations are supported on classes using babel v5 and can be re-enabled under v6 with babel-plugin-transform-decorators-legacy
.
@marshall007 Indeed they are supported on classes, functions and properties, however it sill requires the dev to use prototypical JS, i.e. classes. Some people simply prefer pure JS / TS. :)
I like rate limiting idea, I opened a topic on it in the discussion board here: https://discuss.horizon.io/t/adding-rate-limiting-to-permissions/77
Continuing the permissions-in-comments idea, Horizon could simply use custom jsdoc via one of the ready-to-use parsers like: https://www.npmjs.com/package/jsdoc-parse
Forgive me if this has already been proposed, but I'm just gonna leave a note here just in case.
I think the most robust solution would be incorporating permission checking into the database itself. Something like:
permissions: {
users:
{ name:
{ read: ['others'],
write: ['self']
},
, role:
{ read: ['others'],
write: ['admin']
}
, balance:
{ read: ['self']
, write: ['system']
}
}
}
promotions: {
admin: ['others'],
system: ['admin'],
others: [],
self: ['others']
}
admin: ['others']
means that admin
can be substituted for others
in the permissions
table.
ReQL would then have to be extended with the ability to provide a user id, like:
table('users').insert({..}).withUser(myId)
which would reach into users[user_id].role
. If no user id is provided and the query attempts to read from or write to a protected field, the query fails.
Here's a proposal that combines the templating and raw JS proposals into something that I think is pretty good. I'm going to skip over some of the edge cases to make the bones of it clearer, but most of the edge cases are resolved the same way as in @deontologician's proposal.
The basic idea is that every security rule consists of a
group
, atemplate
, and an optionalvalidator
function (which can be either a write validator or a read validator). If thevalidator
function is left off, it's basically the same as Josh's proposal (queries are legal if they match at least one template). If the validator functions are included, then the rule is that a query is legal if it matches at least one template and passes all the validator functions for templates that it matches.So in the presence of validator functions we'll sometimes have to fall back to doing the retrying-atomic-writes thing. (@encryptio pointed out in a conversation that if we do the backoff properly the worst case for the retrying-atomic-writes thing is actually
O(nlog(n))
instead ofO(n^2)
, which makes it more palatable.)Some examples:
Here's what I like about this model:
{writeValidator: ..., readValidator: ...}
with no template to get exactly that behavior.USER
if people want new features.Ping @coffeemug, @deontologician, @Tryneus, @dalanmiller, @mglukhovsky.