pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.6k stars 963 forks source link

Limetime management for OIDC JWKS #10620

Closed woodruffw closed 2 years ago

woodruffw commented 2 years ago

The first step in verifying the OIDC JWTs is verifying their signatures, which means checking them against the OIDC provider's signing key set.

There are a few approaches we could take to acquiring and maintaining the updatedness of those keys:

  1. Bake them into the Warehouse codebase. We could pull the current JWKS blob from the URI referenced in the provider's openid-configuration.

    • Pros: Simple.
    • Cons: More tedious to update in the event of key updates/rotations; requires a PR each time any provider does so. Gaps between the rotation and updates means that Warehouse would probably reject authentic JWTs.
  2. Fetch the JWKS for each provider on Warehouse startup/initialization. Restarts always re-fetch the JWKS.

    • Pros: Simple.
    • Cons: Most of the same cons as (1).
  3. All of (2), but we also include a periodic job that checks each OIDC provider's JWKS on a schedule.

    • Pros: No rotation downtime.
    • Cons: Requires the most code.
woodruffw commented 2 years ago

(3) is probably the right approach. At minimum, we're going to need:

  1. A registry/configuration list of supported OpenID connectors (just GitHub, for the time being), tied to their openid-configuration URLs
  2. A scheduled periodic task (similar to the ones already in Warehouse) for checking the openid-configuration + JWKS URL
  3. A verification service for incoming JWTs
woodruffw commented 2 years ago

WIP branch is here: https://github.com/trailofbits/warehouse/tree/tob-jwk-management

Some outstanding design questions:

  1. How should we store "live" JWKs? They're JSON blobs and they're not sensitive and they rotate, so it doesn't make sense to cram them into the DB. Perhaps Redis?

  2. What should we do for our development environment? Does it make sense to have a DevelopmentJWKService that uses a local RSA keypair + JWK that we generate as part of make build, or should we just run the "full" JWKService?

woodruffw commented 2 years ago
  • How should we store "live" JWKs? They're JSON blobs and they're not sensitive and they rotate, so it doesn't make sense to cram them into the DB. Perhaps Redis?

I went ahead and did this on the branch -- we cache JWKs under the /warehouse/oidc/jwks/{provider} key for each provider.

You can smoke-test this with:

docker-compose run --rm web python -m warehouse oidc update-oidc-jwks
docker-compose run --rm web python -m warehouse oidc list-jwks --provider github
di commented 2 years ago
  1. All of (2), but we also include a periodic job that checks each OIDC provider's JWKS on a schedule.

There's the potential for a brief window here where a new public key has been published to the JWKS but we haven't fetched it yet, and as a result signature verification will fail with legitimate signatures. Perhaps it be better to fetch the JWKS whenever we attempt to verify a JWT that has a kid (key ID) that we haven't encountered before?

This behavior works with cold-starts (the first verification with no JWK in cache causes us to reach out and get the entire JWKS) and this ensures we always have the latest keys before failing. To avoid these requests becoming abusive we could only repeat a request for a previously nonexistent kid outside of some window.

woodruffw commented 2 years ago

There's the potential for a brief window here where a new public key has been published to the JWKS but we haven't fetched it yet, and as a result signature verification will fail with legitimate signatures. Perhaps it be better to fetch the JWKS whenever we attempt to verify a JWT that has a kid (key ID) that we haven't encountered before?

That sounds good! And yeah, that provides nice self-starting behavior (much better than having to jumpstart things with a CLI command on each startup.)

We'll want to make sure that we preserve provider-first identification with these changes -- a previously unknown kid shouldn't enable a cache poisoning attack where an attacker tricks us into trusting a different provider for a matching kid.

To avoid these requests becoming abusive we could only repeat a request for a previously nonexistent kid outside of some window.

Sounds good. In that case we'll probably want an additional self-expiring KV in Redis for the cooldown window. A cooldown of even just 60s is probably more than sufficient.