mikemintz / rethinkdb-websocket-server

Node.js WebSocket server that proxies to RethinkDB. Supports query validation.
MIT License
156 stars 22 forks source link

Row-level security / automatic filters #12

Open mividtim opened 8 years ago

mividtim commented 8 years ago

It would be great if a client could do something like r.table('turtles').filter({name: 'Ralph'}), and the server would automatically inject the herdId filter (from your example) in between the .table and the .filter provided by the client, so that from a client's perspective, the only data in the DB is what is available to them. This could ease the burden not only on user-level security, but help to support multi-tenancy (e.g. filtering every table on tenantId).

mikemintz commented 8 years ago

@mividtim that would be really cool to have. Although I have a hard time picturing how to implement it in a way that generalizes to different schemas. For example, I might have an online bank app, and I'd generally want to filter Account, Transaction, Statement, and Message with .filter({userId: session.userId}). But maybe I'd like to display r.table('transaction').filter({date: ...}).count() on the front page for everyone to see the total volume, and I wouldn't want the automatic filter for that query. And it could also be tricky for things like ChatMessage which has two user ids.

Do you have ideas on what you think the API could look like?

khoerling commented 8 years ago

There's got to be, at least, an additional flag part of the query, eg: {validateAuthToken: true} to designate which QueryRequests to filter-- yes?

mividtim commented 8 years ago

@mikemintz I guess I was thinking something along the lines of an object with table names as keys and filters as values. Perhaps, to support the multi-tenancy use-case, accept '*' as a key for a catch-all filter across all tables. Perhaps also, the keys could support a comma-separated list of table names, so one filter can be applied across a group of tables, but not all tables.

mividtim commented 8 years ago

@khoerling I'm not sure I understand this question. There is an existing piece of functionality for authenticating a session, and for adding user-specific information to the session which are made available to the whitelists. Does this not cover "validateAuthToken" in your question?

mividtim commented 8 years ago

@mikemintz I think what would make this complicated to implement (default filters for tables) would be joins. I haven't dug into the internals of Rethink enough (yet) to understand this. But what I'm trying to accomplish on this thread is to see whether or not we can describe a system of row- and column-level security in the middle layer, such that not every filter would need to be whitelisted. Throttling is another use case that could be implemented to support a really robust system. My goal here is to minimize the amount of duplicated work in describing queries. Anything short wouldn't quite transcend the status quo of implementing detailed queries at the server-side-app layer, and providing an API to the client to wrap those queries. Having to duplicate the query on the client in full actually adds code to the whole, rather than cull it down. With a good way to describe col- and row-level filters at the middle layer, this system would allow less code overall, and thus higher productivity.

mividtim commented 8 years ago

@mikemintz To use your example, and answer your question more directly, the filter description would boil down to this object: { 'account,transaction,statement,message': RP.filter({userId: session.userId}) } Your use case of providing a summary across all users is interesting, and perhaps this is what the whitelist specifically describes - a way to break out of the bounds of the automatic filters.

Perhaps we need yet another layer involved here, that declares roles in the system, and which filters apply to which roles. For instance, a system administrator would have no filters (query anything you like), a tenant admin would have only the tenancy filter ('*': RP.filter({tenantId: session.tenantId}), but end-users would have filters on their own user ID. Other roles could exist between tenant admin and end-user, as well, like branch managers. These would have to be described somehow in configuration, or by the server-app author providing a closure to this library that is passed a session object and returns a list of filters that apply to that role.

mikemintz commented 8 years ago

@mividtim I agree this would be complicated to implement with joins. Are you thinking maybe we'd only do this on queries that follow a formulaic syntax like r.table(...).filter(...).orderBy(...).limit(...) and require any other query to be in the whitelist? I'd also be concerned there that I didn't fully understand what reql allows, and someone malicious maybe embeds something like r.table('users').filter({name: r.table('transactions').insert(...)})

Either way, doing something like this will first require support for modifying client queries before sending them off to the server, which would be nice to have support in general for. But that can have unintended consequences that we'll have to figure out, like an error in reql might send the replaced query back to the browser in the error message, potentially with sensitive information they weren't supposed to have.

mividtim commented 8 years ago

I think I'd like to study a bit more about ReQL. If we're going to go down the road of altering the query, then I believe it would make sense to go all the way down the rabbit hole, parsing and understanding the query fully, and introducing automated filters as needed at any point in the query (at any table reference), and also whitelisting subqueries. If we fully parse the entire query (much as RethinkDB itself must do) on the way through, I believe we could accomplish this. It would not, certainly, be trivial.

On Wed, Feb 17, 2016 at 4:12 PM Mike Mintz notifications@github.com wrote:

@mividtim https://github.com/mividtim I agree this would be complicated to implement with joins. Are you thinking maybe we'd only do this on queries that follow a formulaic syntax like r.table(...).filter(...).orderBy(...).limit(...) and require any other query to be in the whitelist? I'd also be concerned there that I didn't fully understand what reql allows, and someone malicious maybe embeds something like r.table('users').filter({name: r.table('transactions').insert(...)})

Either way, doing something like this will first require support for modifying client queries before sending them off to the server, which would be nice to have support in general for. But that can have unintended consequences that we'll have to figure out, like an error in reql might send the replaced query back to the browser in the error message, potentially with sensitive information they weren't supposed to have.

— Reply to this email directly or view it on GitHub https://github.com/mikemintz/rethinkdb-websocket-server/issues/12#issuecomment-185406012 .