grpc tls client configuration should be per process not per service

demmer commented 6 years ago

It seems like the configuration for client-side GRPC TLS settings should be reworked to be per-process and not per destination service.

In the current model, the client configuration for cert, key, and ca are prefixed with the destination service and are passed as part of the client api initialization. This means that a process which communicates with multiple grpc services has multiple separate flags for grpc_cert, grpc_key, and grpc_ca, prefixed by the service name to which it communicates.

This is cumbersome, confusing, and is not symmetric with the configuration of the server-side TLS settings for grpc which is shared among all services in a given process. It also requires specific handling in all client-side call sites to expose flags and call grpc.SecureDialOption, as was evidenced in #3433.

It seems to me that there is little benefit to enabling a single process to use different client-side GRPC credentials based on the destination service, and that the more likely case is that the calling process uses the same credentials for all callees, making the current approach cumbersome.

As such it seems better to consolidate this configuration into a shared set of flags so that it is shared among all GRPC clients from a given process.

This is the approach that we followed in #3418 for the static auth, as after some discussion with @sougou, we concluded there was little benefit to presenting different authentication credentials for different callee services.

alainjobart commented 6 years ago

I agree for the binlog and vtgate -> vttablet services, as their destinations are always vttablets, which will have the same server-side TLS config. Then, for the other RPCs we have (client -> vtgate, automation client -> vtctld, vtctld -> vttablet), usually a single process only does one kind of interaction, not much more than that. So specifying different TLS configs doesn't make much sense either.

So yes, I agree too. 👍

tirsen commented 6 years ago

I agree, it's cumbersome. We automatically generate the flags so it's not a major issue for us though.

alainjobart commented 6 years ago

I thought about this a bit more, and I think the real requirement here is to separate client RPCs (the queries to vtgate using vtgateservice) from the management RPCs (the reparent queries, but also vtgate to vttablet, or vttablet to vttablet like binlog). And a client would only do one of these, not both, so I think we're fine.

sougou commented 6 years ago

I think the model @demmer and I discussed (which he's recommending), is that a client certificate identifies the person (or entity) making the request. So, it could be alain, demmer or vtctld making the request. No matter where the request is sent, the identity of the requestor (client certificate) does not change.

The confusion arises because we currently overload authentication and authorization for grpc calls. In reality, TLS should only be used for authentication: identifying that the caller. Authorization needs to be more fine-grained. Just like we have table ACLs, we need to extend that model to various calls. This granularity may be as coarse as queryservice vs tabletmanager. Or, we may have to go to the function level also.

alainjobart commented 6 years ago

Ah I see. Then my feedback is that it's not the way the chains of certs we recommend is meant to be used at the moment. Note that it's not a judgement of which way it should be, just explaining how it was meant to be.

The original idea is for each server, you create a signing certificate (under a toplevel chain of authority, the simplest form of that chain of authority is a single root certificate). Then you sign children certificates that are allowed to access that server, and use the certificate common name as the 'name' of the client. In that model, the client certificates are tied to the server they talk to, they do not express just a client identity, but also the server it's used to talk to.

What you're saying is we switch model, where the chain of authority of the client cert is not enough to authorize that client to connect to a service, but is just merely a validation of the client identity. And then you use some other mechanism to authorize the access (table ACL is one, but we'd extend this to other services).

Maybe there is a way for us to support both use cases, and be really clear in the documentation on how each works? My comments here were about using your changes to achieve the original design. Your comments are about enabling the second design. Maybe we need some more research on which one is the usual way to do things, to make sure we don't prevent it?

vitessio / vitess

grpc tls client configuration should be per process not per service #3435