uber / cadence

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
https://cadenceworkflow.io
MIT License
7.99k stars 776 forks source link

OAuth public/private key concerns #4520

Open vytautas-karpavicius opened 2 years ago

vytautas-karpavicius commented 2 years ago

Recent PR #4364 made a change where it is possible to specify private key per cluster.

This got my attention because it implies that different clusters may have different key pairs. https://github.com/uber/cadence/blob/5191468f2297dc7666c574fe74492fae4e85b58a/common/config/cluster.go#L70

Another thing I noticed that for verification we have only one public key per cluster. https://github.com/uber/cadence/blob/7aca8298408de8ef3b2b4035f31f63b27f3821ed/common/authorization/oauthAuthorizer.go#L65

Here are my concerns:

Current setup implies that a cluster have a key-pair with public key on its side to validate JWT tokens. Since it has one public key to validate tokens against, this means that private key counterpart has to be distributed for signing parties. If this falls under single owner it may not be a problem.

But consider several parties. For this case you do not want to share your private key with them, as this is private by definition. Now consider the opposite scenario. Each party generates its own key-pair, keeps their private key for signing and give Cadence server maintainer their public counterpart. Now the maintainer could add the public key, but only one key can be configured, and there are several parties.

Lets review a similar pattern - ssh connection. The host has a config file ~/.ssh/authorized_keys where multiple public keys can be added that are allowed to login. Another example github itself - you can add multiple public keys to it, that are allowed to login.

I suggest the following:

@noiarek @longquanzheng

longquanzheng commented 2 years ago

Hi @vytautas-karpavicius Thanks for the call out.

we should accept and validate against the list of public keys (not one).

Agreed. I have it in the todo list in the design doc. But it’s not highest priority at the moment so we didn’t implement it. The assumption is that private key will be managed by a centralized service which return the jwt only. So we shouldn’t distribute the private keys. For example, Uber may have its own way of doing service Auth. You will build a centralized service to own the private key and return jwt to other services.

Btw the only implementation in worker using private key is just example only. It’s not meant for actual production cases . Since each company will have their own service Auth mechanism, they should implement the interface themselves (mostly they won’t use private/public keys . More often they would use iam or even api keys)

However, it will still make sense to allow a list of public keys for rotation purposes. So I have it as a todo for this design

longquanzheng commented 2 years ago

cluster should contain only one private key for himself (not a list of them per each replication cluster).

yes or no. It’s going to be difficult to guarantee all clusters have the same key pairs, and it’s not necessary to do so. Since each cluster will have its own config they can certainly be different. Just from the structure point of view, since we allow each cluster have their own config, the public key should allow to be different as well.

But it doesn’t mean they have to be different. In your/some cases, if you want them all to be the same, it’s okay and it’s allowed.

Just from a perspective of what we can provide and not, I think allowing different key pairs make more sense than require every cluster to use the same.

For example, a company may have different teams started their own Cadence clusters and later on they want to connect them together in order to use XDC/cross cluster feature. This would open the possibility. Even the same team owning different clusters may want to use different keys for security purposes.

vytautas-karpavicius commented 2 years ago

Thank you @longquanzheng for clarifications and detailed answers.

Ok, I see that multiple key pairs is indeed required and may be useful in some cases. I'm not against it and it does make sense.

The issue I see, is not the fact that clusters will have different key pairs, but rather the way they are currently configured. Right now we have clusterGroup config section where private keys are specified. It has a private key for current cluster (all good) as well as remote clusters - this is is part that I'm not comfortable with. It means that private key of remote cluster need to be distributed here.

If we were to have a list of public keys to validate against, listing private keys for remote clusters would not be needed. The current cluster will sign it with it own private key only, no matter the destination. The remote cluster would have a list of public keys of corresponding callers that are allowed.

If some centralized service or another implementation that manages the keys is used that may not be an issue - true. But for this particular implementation it can be.

longquanzheng commented 2 years ago

It has a private key for current cluster (all good) as well as remote clusters - this is is part that I'm not comfortable with. It means that private key of remote cluster need to be distributed here.

Right. I do agree the concern, and looks weird. There is a little more background here -- TChannel doesn't support TLS so we have to apply Authorization for cross cluster requests. However, if we have done the Authentication using gRPC, this part can be removed/skipped. As they are all Cadence clusters so we should trust them once requests are authenticated. (This is also what Temporal has done)

longquanzheng commented 2 years ago

https://github.com/uber/cadence/pull/4492 looks like interns traffic are using gRPC today. Does it include traffic across cluster? If so we can add authentication to it and then remove the authorization provider per cluster config — for those cross cluster traffic, as long as it passes authentication, there is no need to perform authorization checks

vytautas-karpavicius commented 2 years ago

Yes internal traffic is on gRPC by default already. For cross DC traffic, it need to be enabled like this: https://github.com/uber/cadence/blob/master/config/development_xdc_cluster0.yaml#L82

Yeah, that make sense. We will need to expose config for that.

longquanzheng commented 2 years ago

@vytautas-karpavicius Just realized that sys worker is broken after that PR: Because sysWorker uses publicClient.hostPort which default to be the same as currentCluster's RPC address. This is a fix: https://github.com/uber/cadence/pull/4560