jpadilla / pyjwt

JSON Web Token implementation in Python
https://pyjwt.readthedocs.io
MIT License
4.99k stars 676 forks source link

The JWKS endpoint did not return a JSON object when keys are found in the cache #914

Open jaferrando opened 10 months ago

jaferrando commented 10 months ago

When the jwks_client.get_jwk_set function finds the keys in the cache, it gets a PyJWKSet object, if not, it performs a fech and gets a JSON object. In both cases it checks if the data is a Dictionary, causing the check to fail when the data was retrieved from the cache.

https://github.com/jpadilla/pyjwt/blob/95638cf04f83b3e2c289b28810501e53195ff938/jwt/jwks_client.py#L68

The right code would return the object found in the cache.

auvipy commented 10 months ago

would you mind contribute the fix please?

jaferrando commented 10 months ago

I've further investigated why the current tests do not raise this error. Put the conclusions in the PR, but I'm adding them here for easier access to anyone falling into the issue.

I've investigated why it does not, and the root cause seems to be that in fetch_data, the cache is not fed with a PyJWKSet but with the dict directly read from the URL. As the cache is not the expected type but a dict, the get_jwk_set function obtains a dict from the cache, and the test passes.

In my case I was feeding the cache in the client directly from an externally stored JWKS, to avoid the fetch from the URL. To do so, I manually created the PyJWKSetCache object and fed it with a PyJWKSet. That's why in my case the cache contained the expected data type and my code failed with the error.

Heres's the result of my code inspection:

First time the get_jwk_set is called, it will get None from the cache and call fetch_data. This is what in my opinion does not honor the declared data types in the type hints:

    def fetch_data(self) -> Any:
        jwk_set: Any = None
        try:
            r = urllib.request.Request(url=self.uri, headers=self.headers)
            with urllib.request.urlopen(r, timeout=self.timeout) as response:
                jwk_set = json.load(response)
        except (URLError, TimeoutError) as e:
            raise PyJWKClientConnectionError(
                f'Fail to fetch data from the url, err: "{e}"'
            )
        else:
            return jwk_set
        finally:
            if self.jwk_set_cache is not None:
                self.jwk_set_cache.put(jwk_set) --> This puts a JSON Dict in jwk_set

Then the JWKSetCache put method that should receive a PyJWKSet as per the type hint will receive a dictionary:

class JWKSetCache:
    def __init__(self, lifespan: int) -> None:
        self.jwk_set_with_timestamp: Optional[PyJWTSetWithTimestamp] = None
        self.lifespan = lifespan

    def put(self, jwk_set: PyJWKSet) -> None:  --> BUT this expects not a Dict but a PyJWKSet (not a class of Dict)
        if jwk_set is not None:
            self.jwk_set_with_timestamp = PyJWTSetWithTimestamp(jwk_set) --> The Dict goes in constructing the PyJWTSetWithTimestamp
        else:
            # clear cache
            self.jwk_set_with_timestamp = None

And that dictionary will find its way through to the final object stored in the cache:

class PyJWTSetWithTimestamp:
    def __init__(self, jwk_set: PyJWKSet):
        self.jwk_set = jwk_set                 ---> The Dict ends as the value in the cached element
        self.timestamp = time.monotonic()

    def get_jwk_set(self) -> PyJWKSet:
        return self.jwk_set          --> This will NOT return a PyJWKSet but a Dict

    def get_timestamp(self) -> float:
        return self.timestamp

Then, the JWKSetCache class will blindly return the dict instead of the PyJWKSet.

class JWKSetCache:
    def __init__(self, lifespan: int) -> None:
        self.jwk_set_with_timestamp: Optional[PyJWTSetWithTimestamp] = None
        self.lifespan = lifespan

    def put(self, jwk_set: PyJWKSet) -> None:
        if jwk_set is not None:
            self.jwk_set_with_timestamp = PyJWTSetWithTimestamp(jwk_set)
        else:
            # clear cache
            self.jwk_set_with_timestamp = None

    def get(self) -> Optional[PyJWKSet]:
        if self.jwk_set_with_timestamp is None or self.is_expired():
            return None

        return self.jwk_set_with_timestamp.get_jwk_set() ---> The cache so, returns a Dict not a PyJWKSet

And finally the get_jwk_set always gets a dict wether it takes the data from the URL or from the cache

    def get_jwk_set(self, refresh: bool = False) -> PyJWKSet:
        data = None
        if self.jwk_set_cache is not None and not refresh:
            data = self.jwk_set_cache.get()  ---> So here we are getting a Dict, despite this function code is wrong, it works

        if data is None:
            data = self.fetch_data()

        if not isinstance(data, dict):
            raise PyJWKClientError("The JWKS endpoint did not return a JSON object")

        return PyJWKSet.from_dict(data)

Any code expecting to receive the PyJWKSet type from the cache, or any code which as in my case feeds the cache in other way than through a fetch_data call will fail.

The code in the PR fixed the problem declared in this issue without impacting any other current use of the module, as both the received and returned types of the function are preserved and no other parts of the module are changed. However, it does no fix use cases that use fetch_data and access the cache directly instead of through the fixed get_jwk_set.

jaferrando commented 10 months ago

Additionally to the tests passing because of the casual coherence between fetch_data and get_jwk_set in putting a dictionary in the cache, I've found because of how test https://github.com/jpadilla/pyjwt/blob/95638cf04f83b3e2c289b28810501e53195ff938/tests/test_jwks_client.py#L222 is coded and the use of mock, the second call to get_signing_key never really executes the method, but directly returns the same PyJWT object. The assertion being that the mocked urlopen method is not called succeeds but without really having run the code a second time.

This is most probably happening, though I did not test them all, with all the other test functions that involve two calls to the same method of the same object with the same parameter values.

Adding a new test like this, produces the expected result of reproducing this issue:

RESPONSE_DATA_WITH_MATCHING_KID = {
    "keys": [
        {
            "alg": "RS256",
            "kty": "RSA",
            "use": "sig",
            "n": "0wtlJRY9-ru61LmOgieeI7_rD1oIna9QpBMAOWw8wTuoIhFQFwcIi7MFB7IEfelCPj08vkfLsuFtR8cG07EE4uvJ78bAqRjMsCvprWp4e2p7hqPnWcpRpDEyHjzirEJle1LPpjLLVaSWgkbrVaOD0lkWkP1T1TkrOset_Obh8BwtO-Ww-UfrEwxTyz1646AGkbT2nL8PX0trXrmira8GnrCkFUgTUS61GoTdb9bCJ19PLX9Gnxw7J0BtR0GubopXq8KlI0ThVql6ZtVGN2dvmrCPAVAZleM5TVB61m0VSXvGWaF6_GeOhbFoyWcyUmFvzWhBm8Q38vWgsSI7oHTkEw",
            "e": "AQAB",
            "kid": "NEE1QURBOTM4MzI5RkFDNTYxOTU1MDg2ODgwQ0UzMTk1QjYyRkRFQw",
            "x5t": "NEE1QURBOTM4MzI5RkFDNTYxOTU1MDg2ODgwQ0UzMTk1QjYyRkRFQw",
            "x5c": [
                "MIIDBzCCAe+gAwIBAgIJNtD9Ozi6j2jJMA0GCSqGSIb3DQEBCwUAMCExHzAdBgNVBAMTFmRldi04N2V2eDlydS5hdXRoMC5jb20wHhcNMTkwNjIwMTU0NDU4WhcNMzMwMjI2MTU0NDU4WjAhMR8wHQYDVQQDExZkZXYtODdldng5cnUuYXV0aDAuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0wtlJRY9+ru61LmOgieeI7/rD1oIna9QpBMAOWw8wTuoIhFQFwcIi7MFB7IEfelCPj08vkfLsuFtR8cG07EE4uvJ78bAqRjMsCvprWp4e2p7hqPnWcpRpDEyHjzirEJle1LPpjLLVaSWgkbrVaOD0lkWkP1T1TkrOset/Obh8BwtO+Ww+UfrEwxTyz1646AGkbT2nL8PX0trXrmira8GnrCkFUgTUS61GoTdb9bCJ19PLX9Gnxw7J0BtR0GubopXq8KlI0ThVql6ZtVGN2dvmrCPAVAZleM5TVB61m0VSXvGWaF6/GeOhbFoyWcyUmFvzWhBm8Q38vWgsSI7oHTkEwIDAQABo0IwQDAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBQlGXpmYaXFB7Q3eG69Uhjd4cFp/jAOBgNVHQ8BAf8EBAMCAoQwDQYJKoZIhvcNAQELBQADggEBAIzQOF/h4T5WWAdjhcIwdNS7hS2Deq+UxxkRv+uavj6O9mHLuRG1q5onvSFShjECXaYT6OGibn7Ufw/JSm3+86ZouMYjBEqGh4OvWRkwARy1YTWUVDGpT2HAwtIq3lfYvhe8P4VfZByp1N4lfn6X2NcJflG+Q+mfXNmRFyyft3Oq51PCZyyAkU7bTun9FmMOyBtmJvQjZ8RXgBLvu9nUcZB8yTVoeUEg4cLczQlli/OkiFXhWgrhVr8uF0/9klslMFXtm78iYSgR8/oC+k1pSNd1+ESSt7n6+JiAQ2Co+ZNKta7LTDGAjGjNDymyoCrZpeuYQwwnHYEHu/0khjAxhXo="
            ],
        },
        {
            "alg": "RS256",
            "kty": "RSA",
            "use": "sig",
            "n": "0wtlJRY9-ru61LmOgieeI7_rD1oIna9QpBMAOWw8wTuoIhFQFwcIi7MFB7IEfelCPj08vkfLsuFtR8cG07EE4uvJ78bAqRjMsCvprWp4e2p7hqPnWcpRpDEyHjzirEJle1LPpjLLVaSWgkbrVaOD0lkWkP1T1TkrOset_Obh8BwtO-Ww-UfrEwxTyz1646AGkbT2nL8PX0trXrmira8GnrCkFUgTUS61GoTdb9bCJ19PLX9Gnxw7J0BtR0GubopXq8KlI0ThVql6ZtVGN2dvmrCPAVAZleM5TVB61m0VSXvGWaF6_GeOhbFoyWcyUmFvzWhBm8Q38vWgsSI7oHTkEw",
            "e": "AQAB",
            "kid": "NEE1QURBOTM4MzI5RkFDNTYxOTU1MDg2ODgwQ0UzMTk1QjYyRkRFQwKKK",
            "x5t": "NEE1QURBOTM4MzI5RkFDNTYxOTU1MDg2ODgwQ0UzMTk1QjYyRkRFQw",
            "x5c": [
                "MIIDBzCCAe+gAwIBAgIJNtD9Ozi6j2jJMA0GCSqGSIb3DQEBCwUAMCExHzAdBgNVBAMTFmRldi04N2V2eDlydS5hdXRoMC5jb20wHhcNMTkwNjIwMTU0NDU4WhcNMzMwMjI2MTU0NDU4WjAhMR8wHQYDVQQDExZkZXYtODdldng5cnUuYXV0aDAuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0wtlJRY9+ru61LmOgieeI7/rD1oIna9QpBMAOWw8wTuoIhFQFwcIi7MFB7IEfelCPj08vkfLsuFtR8cG07EE4uvJ78bAqRjMsCvprWp4e2p7hqPnWcpRpDEyHjzirEJle1LPpjLLVaSWgkbrVaOD0lkWkP1T1TkrOset/Obh8BwtO+Ww+UfrEwxTyz1646AGkbT2nL8PX0trXrmira8GnrCkFUgTUS61GoTdb9bCJ19PLX9Gnxw7J0BtR0GubopXq8KlI0ThVql6ZtVGN2dvmrCPAVAZleM5TVB61m0VSXvGWaF6/GeOhbFoyWcyUmFvzWhBm8Q38vWgsSI7oHTkEwIDAQABo0IwQDAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBQlGXpmYaXFB7Q3eG69Uhjd4cFp/jAOBgNVHQ8BAf8EBAMCAoQwDQYJKoZIhvcNAQELBQADggEBAIzQOF/h4T5WWAdjhcIwdNS7hS2Deq+UxxkRv+uavj6O9mHLuRG1q5onvSFShjECXaYT6OGibn7Ufw/JSm3+86ZouMYjBEqGh4OvWRkwARy1YTWUVDGpT2HAwtIq3lfYvhe8P4VfZByp1N4lfn6X2NcJflG+Q+mfXNmRFyyft3Oq51PCZyyAkU7bTun9FmMOyBtmJvQjZ8RXgBLvu9nUcZB8yTVoeUEg4cLczQlli/OkiFXhWgrhVr8uF0/9klslMFXtm78iYSgR8/oC+k1pSNd1+ESSt7n6+JiAQ2Co+ZNKta7LTDGAjGjNDymyoCrZpeuYQwwnHYEHu/0khjAxhXo="
            ],
        }
    ]
}
[...]
    def test_get_signing_key_caches_result(self):
        url = "https://dev-87evx9ru.auth0.com/.well-known/jwks.json"
        kid = "NEE1QURBOTM4MzI5RkFDNTYxOTU1MDg2ODgwQ0UzMTk1QjYyRkRFQw"

        jwks_client = PyJWKClient(url, cache_keys=True)
        with mocked_success_response(RESPONSE_DATA_WITH_MATCHING_KID) as first_call:
            mresult = jwks_client.get_signing_key(kid)
        assert mresult.key_id == kid
        assert first_call.call_count == 1
        # mocked_response does not allow urllib.request.urlopen to be called twice
        # so a second mock is needed
        with mocked_success_response(RESPONSE_DATA_WITH_MATCHING_KID) as repeated_call:
            result2 = jwks_client.get_signing_key(kid+"KKK")
        assert result2.key_id == kid+"KKK"

        assert repeated_call.call_count == 0

This test fails with the error mentioned in this issue using the existing code, and works with the code in the PR. Will add this change to the PR.

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days