ramosbugs / openidconnect-rs

OpenID Connect Library for Rust
MIT License
427 stars 103 forks source link

NoMatchingKeys when there is clearly a matching key #152

Closed timas130 closed 9 months ago

timas130 commented 9 months ago

The situation I have is extremely weird, and has been haunting me for a few months at this point.

I use openidconnect to authenticate my users through Google. When the server starts, CoreProviderMetadata::discover_async is called, it downloads everything needed, and then the CoreClient is created using that. It is then stored immutably.

In my application the ID token is checked on the backend. This is mainly for compatibility reasons. All goes well for a while, but after some time passes (around a week), many users start to receive errors when trying to log in with Google. Looking at the logs, the error is SignatureVerification(NoMatchingKey). After I restart the server, the error goes away for everyone who had it.

So, I used tcpdump to record the login requests before and after the restart. What I found is that they are absolutely, byte-for-byte, identical, including the ID token. I also decoded them and checked the kid with the ones found on Google's jwks_url, and they match.

Here's a sample token (I replaced the data with {}):

eyJhbGciOiJSUzI1NiIsImtpZCI6IjU1YzE4OGE4MzU0NmZjMTg4ZTUxNTc2YmE3MjgzNmUwNjAwZThiNzMiLCJ0eXAiOiJKV1QifQ.e30.lpZ_ppAHoR1NKQb7GzOKAf9sXy4h6LFi2HW4BBAvUBBkJCVyfAxQofekgClIwpdQTBrP650A-izz84KiSGtWTqFm-9jVNx7_gJH3mCpwKR4xWCdVNlcD72KM62bm2hSTznMO9-jo7xjhh9kLBrN0dbFln9yc4t4TH_7SujyBfcKR2cdcaxWsFG7kIilrv45QWcoPebkoteLJPWRDkthIw-NWJQxrYjPpfG8zj_F6wNXSS_WoB5hWwJFyXUQLPsiDe_H3tIMBbNuFc_5yFyRcFBX4WeLGerKshgGptOn8CJSp-OR1rK5y4MBzCUS3D7Ag6brUs5ifuRfGne28tlGSOQ

Why does this happen? Does the JWK set expire after some time? Please help.

ramosbugs commented 9 months ago

It sounds like Google is rotating their signing keys regularly, and this crate's Client never re-fetches the JWK set once instantiated. The intent when I designed this API was not to have Clients be long-lived, which the docs should definitely be clearer about.

There's a relevant discussion in #25. My recommendation is to instantiate a new Client for each request, but to use an HTTP client that implements caching (and respects Google's Cache-Control header) to avoid making HTTP requests for discovery each time. It looks like Google returns cache-control: public, max-age=3600 for its openid-configuration and a longer TTL for the JWK set, so this approach shouldn't add noticeable latency to most requests.

I see there's an http-cache-reqwest crate that might work for this purpose. I'm in the process of changing the HTTP request/response interface for the next major release of oauth2 (see ramosbugs/oauth2-rs#253 and ramosbugs/oauth2-rs#236), which should allow a pre-instantiated reqwest::Client to be used directly (including one with the http-cache-reqwest middleware, I expect). Once those changes are released, I'll update this crate to follow the same interface.

Until then, you should be able to pass your own custom HTTP client to request/request_async to use a cache-aware HTTP client.

timas130 commented 9 months ago

Thank you so much for the insight. That makes sense, and it really looks like Google creates a new key every week, and older keys expire in two weeks. Or at least that's the theory. I'll keep an eye on them.