oauth-wg / oauth-transaction-tokens

MIT License
8 stars 12 forks source link

Key rotation guidance #109

Open ashayraut opened 4 months ago

ashayraut commented 4 months ago

Key rotation is interesting. If you rotate key at time T1 and Tx token services starts to issue tokens with new key at same time, we have to callout that it should do it at T1+X where X is the SLA for ensuring all services that validate signature will receive the new public key to verify signature. Keys can be shared out of band. One idea is to generate two pairs and two corresponding public keys will be available with services all the time. Tx Token service will have two private pairs available. Lets say PrvtKey-1 and PrvtKey-2 are with issuers. PrvtKey-1 should be used from T1 to T1+24hrs and Key-2 from T1+24 to T1+48hrs. When Tx token switches from Key-1 to 2, it doesn't have to worry about some service not having public key for key-2 to validate the token. This way key synchronization is out of band + key rotation happens frequently which keeps key rotation machinery well-tested. Generally there is no need to rotate key every 24hrs so we can choose to relax that but even if we have to force rotate key then we have to make sure force rotated key (i.e new key pair) should be used to mint tokens only when we can guarantee that those tokens can be validated.

arndt-s commented 4 months ago

Transaction tokens are Json Web Tokens (JWT) they leverage Json Web Signatures (JWS) for validation. This gives implementors/deployments a vast choice of options - from a choice between symmetric and asymmetric to various key delivery mechanisms. If I understand above correctly it is currently perfectly possible to achieve this, for instance by using the kid claim in JWT and publishing multiple asymmetric keys ahead of usage via JWKS. Please correct me if I'm wrong.


but even if we have to force rotate key then we have to make sure force rotated key (i.e new key pair) should be used to mint tokens only when we can guarantee that those tokens can be validated.

What sort of guarantees do you refer to? Is it enough to know that validating party can look up a asymmetric key or is it more sophisticated such as "symmetric key is delivered and available on the host"?

ashayraut commented 4 months ago

Is it enough to know that validating party can look up a asymmetric key

I think so.

ashayraut commented 4 months ago

By the way, yes JWKS provides host of options. But in this TraTs model, to optimize cost and performance impact, it is good to choose an algorithm that is OK to be expensive (CPU) while issuing a token v/s while validating a token because the number of validating parties will be a lot big in number. So algorithms that are cheaper in signature verification (ECDSA P256) could be better.

But we can recommend to choose something that is cheaper for many and expensive for Tx Token service.

tulshi commented 4 months ago

Best practices around choice of algorithms, key rotation, etc. for TraTs could be a separate draft, IMO. I'd like to know what others in the group think.

obfuscoder commented 4 months ago

I don't think a separate draft should be necessary. I suggest to give some recommendation within the current draft (security consideration section).

Time spent on token validation by workloads should be considered, but I don't think we should deviate too much from RFC7518 for example. Otherwise we make it more difficult for implementers and interop.

What every deployment needs to consider is that there will be a lot more TraTs being issued than Access Tokens (ATs). One AT, which is valid for maybe an hour can be used in thousands of requests to resource servers which will each create new TraTs. This can have an impact on performance as well as available key space and collision probabilities.

Re key rotation and availability of signature key material: This aspect is already adressed by standards we rely upon. I don't think that there is need for any specifics. Naturally "kid" should be used as well as making validation keys available before they are being used. Workloads should fetch key sets on demand if they encounter a kid they don't know.