vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.56k stars 2.09k forks source link

Vitess MySQL Credentials Propagation #3044

Closed rafael closed 7 years ago

rafael commented 7 years ago

Description

In order to support a vitess deployment containing customer sensitive data, we need the ability for engineers to perform ad-hoc queries, without being able to access certain columns from sensitive tables. We implement this in our current system with a mysql user grant with limited permissions including column-specific access control that is only used when debugging.

In order to maintain this support in a vitess-enabled system, we need the ability for some queries that come through VTGate to be executed against mysql using a limited privilege user.

The existing Table ACL feature is insufficient because of:

  1. We need controls on a per-column not a per-table basis
  2. We want to have the same restrictions when directly connecting to mysql as we do when querying through VTGate.

Thus this proposal is to optionally add a new connection pool to the VTTablet using a limited-permission debug user, and to enable a mapping from the GRPC caller ID that tells the query executor to use this limited privilege user for queries.

Proposal

The objective is to implement this feature by leveraging most of the concepts and constructs that Vitess already provides.

One approach that seems feasible is to add one more connection pool to VTTablet. Let's call it AppDebugConfig. Users of vitess can use this pool to have a special MySQL user for debugging purposes. This pool can be configured in the same way as any other pool.

With this change, it is possible to have a snowflake user that gets propagated all the way from the VTGate to the VTTablet and eventually MySQL uses this special connection pool. VTTablet can decide which connection pool to use depending on the CallerID available in the context.

Most of the changes will be confined to the VTTablet shown in the following graph, where getConn will use a different pool if the CallerID is appDebug user.

Implementation Idea

Changes will be mostly in the following components:

DBConfigs

TxPool

QueryEngine:

QueryExecutor:

Pros

Cons

sougou commented 7 years ago

This is a good start. We need a few more alternatives to be presented. Then we can brainstorm pros & cons.

derekperkins commented 7 years ago

I don't have any need for separate MySQL users for the time being. My main thought on this proposal is that if you're going to go through the trouble of adding multiple connection pools, I wouldn't hardcode it to a single new pool with a specific user. It'd be more globally usable to be able to define N pools, each defined to use a specific user.

What I'm more interested in is the potential revamp of Vitess level ACL with #2974 / #2975. I haven't looked at the internals yet, whether or not column level restrictions are supported, but extending that would seem to have larger dividends than this one specific use case.

demmer commented 7 years ago

@derekperkins That casbin work definitely looks interesting.

For our use case (Slack) we are pretty much constrained to solutions that rely on the underlying mysql grants as the final enforcement mechanism.

We considered whether or not to make this more generalized (N pools instead of 1) but it would make the configuration more complicated and we didn't see the use case to justify the extra connections.

sougou commented 7 years ago

Here are some alternate proposals:

  1. The experimental tablet category is unused. One could build out some experimental replicas, and make vttablet use the restricted user's credentials to connect to MySQL.
  2. Build a serving-only version of vttablet. It can point to any existing mysql instance, but with the restricted user's credentials. We have to find a way to make VTGate route to it. One obvious way is to create a new tablet type. But we can brainstorm other ways.

The downside of option 1 is that you need to provision additional instances.

Both the above options won't let you write to the master. They are only for reads. Do we need to allow writes by such users? I presume not.

Option 2 gives you some additional flexibility and also solves some additional problems:

The fact that this approach solves more than one problem, and is architecturally clean, makes it worth investigating.

Other option suggested by @michael-berlin: create an on-demand connection with the assigned credentials if the username matches a pattern. This is similar to someone else's suggestion of ExecuteFetchAsDebug (I think @demmer proposed it).

sougou commented 7 years ago

Now that UPSERT is done, I was able to spend more time thinking about this problem.

TL;DR:

Design problem

Wanting to rely on MySQL's privilege is a desire that stands at odds against a system that relies on connection pools. Sooner or later, the two requirements will clash causing an explosion in complexity:

  1. Every such special user will add another set of dbconfig parameters to vttablet, and will weigh down on the already long list we have.
  2. Every user will eventually attempt to multiply with every pool available in vttablet causing an inordinate number of pools.

These will then raise other questions about per-pool timeouts, pool sizes, and every pool will export its own /debug/vars variable, which will cause monitoring overload, etc.

The clean way to address this problem was suggested by @bbeaudreault on Slack: read the permissions from MySQL and enforce them above the connection pools. This is something we should pursue as a long term solution. This will also give us the opportunity to clean-up the badly implemented ACL system. I spent some time reading up on mysql grants and how they get stored in the information_schema. I couldn't see anything there that could not be done by vttablet.

How do we make this work in the short term?

I suggest modifying dbconnpool: https://github.com/youtube/vitess/blob/master/go/vt/vttablet/tabletserver/connpool/pool.go. The Get function takes the context as input already. If it's the 'special' user, then it opens a new connection for that user (instead of getting one from the pool) and returns a DBConn with its pool set to nil. The Recycle of DBConn can then be modified to just close the connection if pool was nil.

There are some advantages to this approach:

alainjobart commented 7 years ago

After having read all this, I have a few comments / opinions I'd like to share.

First, I feel like reading the user map from MySQL (as suggested by @bbeaudreault on Slack and recorded by @sougou earlier) is a bit convoluted: MySQL would need the vttablet user, and then we'd use all the other users in there for vttablet ACLs. I don't feel like managing users in MySQL is easy currently with Vitess. You have to send the GRANT commands to each shard master in a keyspace. Now I understand that for a migration from MySQL to Vitess, it might be easier to do this: use your current MySQL grants, add certificates or LDAP credentials for the user names (so they can log in), and go from there.

I feel like a better Vitess-based system would be to store the mappings between user groups and the tables / columns they can access in the topology. Similar to vschema. Then we can have either vtgate or vttablet or both enforce the rules. We can still use a casbin config file, and feed it to a casbin plugin to achieve what we want. Or just use the current config format, but read it and watch it from topology. Then MySQL's permissions just contain the local user for vttablet, and that's it. Everything else is at the Vitess layer.

demmer commented 7 years ago

Adding my .02 to the discussion, I feel like the on-demand connection for the snowflake user is a great idea -- thanks @sougou for the suggestion and I'm glad (though unsurprised) to see that @rafael got the implementation done quickly.

I also share @alainjobart's opinion that this other proposal isn't actually a better way to do it:

The clean way to address this problem was suggested by @bbeaudreault on Slack: read the permissions from MySQL and enforce them above the connection pools. This is something we should pursue as a long term solution. This will also give us the opportunity to clean-up the badly implemented ACL system. I spent some time reading up on mysql grants and how they get stored in the information_schema. I couldn't see anything there that could not be done by vttablet.

Although I can certainly see some benefit in the long run of extending vitess-level ACLs to enforce column-level restrictions like mysql grants do, in my opinion a big benefit of that feature would be to allow users to avoid managing multiple mysql grants for different access control levels altogether.

That said, the on-demand connection is more than satisfactory for Slack's needs, so we don't have any immediate (or even long-term) projected need for said enhancement.

bbeaudreault commented 7 years ago

My suggestion actually was closer to @alainjobart's but I can see how it was misinterpreted:

not sure why the mysql grants couldnt be translated into acl rules, if the acl could operate on columns

I meant more that if Vitess' ACL could understand column-based access, then one could create equivalent ACL rules in vitess that do the same as what the mysql grants currently enforce. The only reason I could see one not wanting this was if they have an external auditing system that understands the mysql grant schema. But we could probably create an API on the vttablet which allows programmatic access to the defined grants.

So I would love to see this implemented with ACL rather than a new snowflake user, but @demmer made valid points on slack about the effort involved in that in terms of translating them, getting security team buy-in, etc. not to mention implementation.

rafael commented 7 years ago

With #3078, I think we can consider this close for now.