rafael commented 7 years ago

Description

In order to support a vitess deployment containing customer sensitive data, we need the ability for engineers to perform ad-hoc queries, without being able to access certain columns from sensitive tables. We implement this in our current system with a mysql user grant with limited permissions including column-specific access control that is only used when debugging.

In order to maintain this support in a vitess-enabled system, we need the ability for some queries that come through VTGate to be executed against mysql using a limited privilege user.

The existing Table ACL feature is insufficient because of:

We need controls on a per-column not a per-table basis
We want to have the same restrictions when directly connecting to mysql as we do when querying through VTGate.

Thus this proposal is to optionally add a new connection pool to the VTTablet using a limited-permission debug user, and to enable a mapping from the GRPC caller ID that tells the query executor to use this limited privilege user for queries.

Proposal

The objective is to implement this feature by leveraging most of the concepts and constructs that Vitess already provides.

One approach that seems feasible is to add one more connection pool to VTTablet. Let's call it AppDebugConfig. Users of vitess can use this pool to have a special MySQL user for debugging purposes. This pool can be configured in the same way as any other pool.

With this change, it is possible to have a snowflake user that gets propagated all the way from the VTGate to the VTTablet and eventually MySQL uses this special connection pool. VTTablet can decide which connection pool to use depending on the CallerID available in the context.

Most of the changes will be confined to the VTTablet shown in the following graph, where getConn will use a different pool if the CallerID is appDebug user.

Implementation Idea

Changes will be mostly in the following components:

DBConfigs

Add another db user to Vitess (db-appdebug). This can be done in dbconfigs.

TxPool

Modify tabletserver TxPool to add an extra connection pool to support db-appdebug.
Modify TxPool.Begin to use db-appdebug if CallerID matches it. This could be done a very similar way as existent code for useFoundRows.

QueryEngine:

Similarly to the changes made in the TxPool, an extra pool will be added to the QueryEngine.

QueryExecutor:

It will be modified so a different pool can be passed to the QueryEngine depending on the CallerID.

Pros

Changes to the project are confined to mostly one component.
Reuses concepts already provided by Vitess: CallerID.

Cons

Introduces snowflake user (db-appdebug) that will have a behavior that differs from the rest of Vitess users. We will have to be very explicit in Vitess docs about this fact and be very clear that the only user that doesn’t use the application pool is db-appdebug.
You can't use FoundRows and Debug user simultaneously.

sougou commented 7 years ago

This is a good start. We need a few more alternatives to be presented. Then we can brainstorm pros & cons.

derekperkins commented 7 years ago

I don't have any need for separate MySQL users for the time being. My main thought on this proposal is that if you're going to go through the trouble of adding multiple connection pools, I wouldn't hardcode it to a single new pool with a specific user. It'd be more globally usable to be able to define N pools, each defined to use a specific user.

What I'm more interested in is the potential revamp of Vitess level ACL with #2974 / #2975. I haven't looked at the internals yet, whether or not column level restrictions are supported, but extending that would seem to have larger dividends than this one specific use case.

demmer commented 7 years ago

@derekperkins That casbin work definitely looks interesting.

For our use case (Slack) we are pretty much constrained to solutions that rely on the underlying mysql grants as the final enforcement mechanism.

We considered whether or not to make this more generalized (N pools instead of 1) but it would make the configuration more complicated and we didn't see the use case to justify the extra connections.

sougou commented 7 years ago

Here are some alternate proposals:

The experimental tablet category is unused. One could build out some experimental replicas, and make vttablet use the restricted user's credentials to connect to MySQL.
Build a serving-only version of vttablet. It can point to any existing mysql instance, but with the restricted user's credentials. We have to find a way to make VTGate route to it. One obvious way is to create a new tablet type. But we can brainstorm other ways.

The downside of option 1 is that you need to provision additional instances.

Both the above options won't let you write to the master. They are only for reads. Do we need to allow writes by such users? I presume not.

Option 2 gives you some additional flexibility and also solves some additional problems:

You can deploy such 'readers' outside of the main instance.
You can have multiple of them and scale them up and down as needed.
You can configure them to have different tolerances, with more (or less) aggressive timeouts, and different pool sizes.
They have the opportunity to address high QPS demands by taking load away from the main vttablet, like @tirsen is likely to face. The caveat, of course, is that it's currently only for reads.
The vttablet code is structured such that such a binary can be trivially built. In fact, we used to support such a binary for the longest time (vtocc). Or we could have it as just vttablet that behaves differently if it's the 'reader' type.
Can be extended to handle master writes in the future.

The fact that this approach solves more than one problem, and is architecturally clean, makes it worth investigating.

Other option suggested by @michael-berlin: create an on-demand connection with the assigned credentials if the username matches a pattern. This is similar to someone else's suggestion of ExecuteFetchAsDebug (I think @demmer proposed it).

sougou commented 7 years ago

Now that UPSERT is done, I was able to spend more time thinking about this problem.

TL;DR:

The needs spelled out in the issue are legitimate, and we have to find a way to satisfy them.
The proposal of a 'reader' vttablet will (barely) meet the above requirements. Even so, it will be an awkward solution for the problem at hand. So, I'd understand if it's not considered worthy.
Going with a snowflake user that maps to a mysql user would be the best short-term solution that I can think of. But I have a proposed twist that can greatly minimize code change, explained at the end.

Design problem

Wanting to rely on MySQL's privilege is a desire that stands at odds against a system that relies on connection pools. Sooner or later, the two requirements will clash causing an explosion in complexity:

Every such special user will add another set of dbconfig parameters to vttablet, and will weigh down on the already long list we have.
Every user will eventually attempt to multiply with every pool available in vttablet causing an inordinate number of pools.

These will then raise other questions about per-pool timeouts, pool sizes, and every pool will export its own /debug/vars variable, which will cause monitoring overload, etc.

The clean way to address this problem was suggested by @bbeaudreault on Slack: read the permissions from MySQL and enforce them above the connection pools. This is something we should pursue as a long term solution. This will also give us the opportunity to clean-up the badly implemented ACL system. I spent some time reading up on mysql grants and how they get stored in the information_schema. I couldn't see anything there that could not be done by vttablet.

How do we make this work in the short term?

I suggest modifying dbconnpool: https://github.com/youtube/vitess/blob/master/go/vt/vttablet/tabletserver/connpool/pool.go. The Get function takes the context as input already. If it's the 'special' user, then it opens a new connection for that user (instead of getting one from the pool) and returns a DBConn with its pool set to nil. The Recycle of DBConn can then be modified to just close the connection if pool was nil.

There are some advantages to this approach:

All changes are localized to this package. You can even define the additional flags here. When time comes to rip it off, it will be easy work.
No other part of vttablet needs to change.
This pool is used by 'all other pools'. So, the special user will have access to the full functionality of vitess, and the only restrictions will be those enforced by MySQL.
Since it's a localized change, it's easy to test it.

alainjobart commented 7 years ago

After having read all this, I have a few comments / opinions I'd like to share.

First, I feel like reading the user map from MySQL (as suggested by @bbeaudreault on Slack and recorded by @sougou earlier) is a bit convoluted: MySQL would need the vttablet user, and then we'd use all the other users in there for vttablet ACLs. I don't feel like managing users in MySQL is easy currently with Vitess. You have to send the GRANT commands to each shard master in a keyspace. Now I understand that for a migration from MySQL to Vitess, it might be easier to do this: use your current MySQL grants, add certificates or LDAP credentials for the user names (so they can log in), and go from there.

I feel like a better Vitess-based system would be to store the mappings between user groups and the tables / columns they can access in the topology. Similar to vschema. Then we can have either vtgate or vttablet or both enforce the rules. We can still use a casbin config file, and feed it to a casbin plugin to achieve what we want. Or just use the current config format, but read it and watch it from topology. Then MySQL's permissions just contain the local user for vttablet, and that's it. Everything else is at the Vitess layer.

demmer commented 7 years ago

Adding my .02 to the discussion, I feel like the on-demand connection for the snowflake user is a great idea -- thanks @sougou for the suggestion and I'm glad (though unsurprised) to see that @rafael got the implementation done quickly.

I also share @alainjobart's opinion that this other proposal isn't actually a better way to do it:

The clean way to address this problem was suggested by @bbeaudreault on Slack: read the permissions from MySQL and enforce them above the connection pools. This is something we should pursue as a long term solution. This will also give us the opportunity to clean-up the badly implemented ACL system. I spent some time reading up on mysql grants and how they get stored in the information_schema. I couldn't see anything there that could not be done by vttablet.

Although I can certainly see some benefit in the long run of extending vitess-level ACLs to enforce column-level restrictions like mysql grants do, in my opinion a big benefit of that feature would be to allow users to avoid managing multiple mysql grants for different access control levels altogether.

That said, the on-demand connection is more than satisfactory for Slack's needs, so we don't have any immediate (or even long-term) projected need for said enhancement.

bbeaudreault commented 7 years ago

My suggestion actually was closer to @alainjobart's but I can see how it was misinterpreted:

not sure why the mysql grants couldnt be translated into acl rules, if the acl could operate on columns

I meant more that if Vitess' ACL could understand column-based access, then one could create equivalent ACL rules in vitess that do the same as what the mysql grants currently enforce. The only reason I could see one not wanting this was if they have an external auditing system that understands the mysql grant schema. But we could probably create an API on the vttablet which allows programmatic access to the defined grants.

So I would love to see this implemented with ACL rather than a new snowflake user, but @demmer made valid points on slack about the effort involved in that in terms of translating them, getting security team buy-in, etc. not to mention implementation.

rafael commented 7 years ago

With #3078, I think we can consider this close for now.

vitessio / vitess

Vitess MySQL Credentials Propagation #3044

Description

Proposal

Implementation Idea

DBConfigs

TxPool

QueryEngine:

QueryExecutor:

Pros

Cons

TL;DR:

Design problem

How do we make this work in the short term?