urbanplatform / django-keycloak-auth

Middleware to allow authorization using Keycloak and Django for django-rest-framework (DRF). This package should only be used in projects starting from scratch, since it overrides the users' management.
MIT License
32 stars 14 forks source link

v2: Complete app refactor #39

Closed diogosilva30 closed 2 years ago

diogosilva30 commented 2 years ago

Hi. I was exploring the repository, and I very much agree with @moritz89 regarding #12 . If there is a library python-keycloak that is better maintained we should delegate the keycloak API interactions to it. This move heavily simplifies our code, leads to fewer possible debugs such as #38 ( which I believe it has been fixed in this version), and mainly allows us to focus our short time in integrating the package with django. From what I have seen, python-keycloak seems far more stable in regards to API interaction. Our team was facing a jwks error, this should be fixed aswell.

I have divided the code into 2 core and separate concepts:

Token --> this class is the core object of this app. If serves as a container for access and refresh tokens. Additionally, it contains several properties to retrieve information about a specific token, as well as validation mechanisms. This class is what should be used to transfer data between the different parts of the app.

KeycloakAdminConnector --> this object replaces the old Connector, and is used to interface with Keycloak in ADMIN operations: "list users", "get user by id", etc...

I also found a lot of code that was not being used/called anywhere in the codebase, so I deleted it. A new config.py module with a settings attribute was also created to provide better parsing and single-source-of-truth settings.

Looking forward for a detailed review and opinions.

moritz89 commented 2 years ago

The general direction looks good. I have a few points, but they should all be addressable. Good points:

Some behaviors that I would change or at least make configurable:

My main concern is the behavior of the with_active_token_property. It checks if the token is active, even though the default access token lifespan is only 30 minutes, which makes it largely superfluous. I'd suggest either making the check optional or using the revoked token callback functionality.

As a reference I've added the JWTs of an admin and normal user:

Admin token ```json { "exp": 1666537275, "iat": 1666536975, "auth_time": 1666536314, "jti": "60c114dd-f0da-4639-b345-ac17f3adac1d", "iss": "http://localhost:8080/auth/realms/my-realm", "aud": [ "core-service", "account" ], "sub": "3a768e53-9a74-4504-b6e8-1570809f60b9", "typ": "Bearer", "azp": "web-app", "nonce": "58e0218d-75c0-43d2-804a-6340a3aa7227", "session_state": "19cd8ecd-f183-4738-861e-d559665f604c", "acr": "0", "allowed-origins": [ "http://localhost:4200" ], "realm_access": { "roles": [ "offline_access", "admin", "uma_authorization", "default-roles-my-realm" ] }, "resource_access": { "core-service": { "roles": [ "admin" ] }, "account": { "roles": [ "manage-account", "manage-account-links", "view-profile" ] } }, "scope": "openid profile email", "sid": "19cd8ecd-f183-4738-861e-d559665f604c", "email_verified": true, "name": "My Realm Admin", "preferred_username": "my-realm-admin", "given_name": "My Realm", "family_name": "Admin", "email": "admin@example.com" } ```
Normal user token ```json { "exp": 1666537366, "iat": 1666537066, "auth_time": 1666537065, "jti": "bb3864a5-9985-456b-9709-12a6c6faf6e8", "iss": "http://localhost:8080/auth/realms/my-realm", "aud": "account", "sub": "be148f39-ae1b-4f80-8ee1-5d6a9fdc749c", "typ": "Bearer", "azp": "web-app", "nonce": "3b4f1bc8-cbc1-4293-8149-86873b5c58a5", "session_state": "7a1efb7d-849f-4351-be22-2038050dad76", "acr": "1", "allowed-origins": [ "http://localhost:4200" ], "realm_access": { "roles": [ "offline_access", "uma_authorization", "default-roles-my-realm" ] }, "resource_access": { "account": { "roles": [ "manage-account", "manage-account-links", "view-profile" ] } }, "scope": "openid profile email", "sid": "7a1efb7d-849f-4351-be22-2038050dad76", "email_verified": true, "name": "Normal User", "preferred_username": "my-realm-user", "given_name": "Normal", "family_name": "User", "email": "user@example.com" } ```
diogosilva30 commented 2 years ago

Thanks for feedback @moritz89 .

Deprecated code

Why this block is no longer used? Is it safe to delete it?

def pass_auth(self, request):
    """
    Check if the current URI path needs to skip authorization
    """
    logging.warning("Not used by middleware", DeprecationWarning, 2)
    path = request.path_info.lstrip("/")
    return any(re.match(m, path) for m in self.keycloak.exempt_uris)

If we push this as a major new release "2.0", what do you think of removing deprecated stuff, such as the above code and graphene middleware?

Async middleware

Check commit b56ee03 and f806de7 regarding the async middleware.

Role/scope checks only using introspection

I didn't notice the decode token method, since it was not documented. Please check the commit 00946e3. I believe active property should continue using the API to check if the token is valid, right?

Authentication without having to call Keycloak server (introspection depending on whether data is available in token)

Not sure how to implement this. Are you referring to from_credentials class method?

with_active_token_property concern

Please provide more information

moritz89 commented 2 years ago

@diogosilva30 Regarding the active token check. I'd recap the OAuth flow to make sure we're on the same page.

image

The refresh token has a long lifetime (several weeks) while the access token only has a short lifetime (~3 to 120 minutes). Therefore, as the expiration time is part of the token, it is signed and short lived, it should not be necessary to check with Keycloak whether the access token is still active. By decoding the token (cryptographically signed) and checking the system time, it can be checked if the access token is valid.

The only use cases for querying Keycloak whether the access token is active is if it has been configured with a long lifetime (not recommended) or if a short (< 30 min) time to block access is required. In the second case, setting the access token life time to a few minutes is the easiest option. Otherwise it would require handling backchannel logout callbacks from Keycloak (Backchannel Logout URL option) to populate a local dict to invalidate sessions which is checked when decoding JWTs.

diogosilva30 commented 2 years ago

@moritz89 I have implemented your suggestions. Please see 2bef36d. I was not aware of exp and iat properties. Token active property now uses these to validate instead of introspection, as in the current codebase. I've added the TOKEN_TIMEOUT_FACTOR to settings aswell.

I have made a few additional changes that I would like to get feedback on (@moritz89 @uw-rvitorino ).

Basic auth support

In the KeycloakMiddleware I added support for Basic (username+password) authorization in header. If the user passes keycloak username + password, these credentials are used to issue an access_token, and the HTTP_AUTHORIZATION is changed to Bearer <access_token>.

Why this change?

Use case: When working with drf-spetacular a custom auth schema has to be created for Keycloak. With this change a user can define the Keycloak credentials once in the swagger view as bellow:

Screen Shot 2022-10-24 at 15 11 34 PM

These credentials are then passed in each request, and our middleware takes care of converting from "Basic auth" to "Bearer token".

If you find that this behaviour should be optional, we could create a setting to enable/disable it.

moritz89 commented 2 years ago

Just a note regarding the token timeout. In the previous implementation I only used a token timeout check to verify if the Django client's access token should be renewed to avoid access errors when performing admin calls to Keycloak. This was more of a hack as the proper method would have been to check the error codes, as done by python-keycloak with auto_refresh_token in their raw_get/raw_post mathods.

Therefore, in this aspect the token timeout functionality is not needed any more.

In your MR you've added it for a different purpose, to check if the access token has expired. To check if a user's access token is valid (not expired, not tampered with) calling KEYCLOAK.decode_token() should be enough. If it has expired, it'll raise InvalidTokenError from jwt.exceptions. You can easily test this with short (e.g., 1 min lifetime access tokens).

Therefore, we can remove the explicit has expired and is active check from the MR since it is handled in decode_token().

moritz89 commented 2 years ago

In a different point, the audience check, this should not be disabled by default in decode_token(). Given two clients (web-app/public access type and core-service/confidential access type), to properly handle this case:

  1. Go to clients --> web-app --> Mappers tab --> Create
  2. Name --> Core service audience for web-app access tokens
  3. Mapper type --> Audience (not audience resolve)
  4. Included client audience --> core-service
  5. Add to ID token --> Off
  6. Add to access token --> On

For advanced use cases (using roles and client scopes), the audience resolve mapper type can be used to add the audience to the token.

An idea would be to add it as an option in settings if this should be verified.

moritz89 commented 2 years ago

Regarding basic auth support. I have nothing against it. We're all adults here and if someone were to use it in prod (not recommended) that is their own prerogative :smile:

It should also not interfere with the session / default auth middleware as it uses the cookie header.

diogosilva30 commented 2 years ago

@moritz89 Can you work on the mentioned changes? I'm kinda busy.

moritz89 commented 2 years ago

@diogosilva30 Sorry, I had a larger commit that collided with your settings changes. I placed your 3 commits on a separate v2-complete-redo-settings-update branch. I only had a quick look, but you seemed to have added errors for importing settings?

moritz89 commented 2 years ago

Also, I would not merge the changes until we have a CI with some test coverage in the changed portions as it is a significant change.

As the functionality should generally stay the same, I would propose merging #33 and then implementing some basic tests. This would help us find behaviors and APIs that changed.

@uw-rvitorino This whole v2 is a big change and I wanted to ask if the changes are generally going into a direction that also aligns with your needs?

diogosilva30 commented 2 years ago

@moritz89 I was having circular imports error between config.py and errors.py. I believe those are fixed now. I looked at your changes. They seemed fine, only the type annotation of Token constructor were misplaced. Additionally, I made some improvements on settings parsing inside config.py.

diogosilva30 commented 2 years ago

The main aspect that is open is exception handling and getting input from the UrbanPlatform maintainers

I will reach out to @uw-rvitorino. He's rather busy tho

moritz89 commented 2 years ago

@diogosilva30 Do you have any changes in the pipeline? I'd push the rebased add CI tests branch once you've pushed any local changes

diogosilva30 commented 2 years ago

@diogosilva30 Do you have any changes in the pipeline? I'd push the rebased add CI tests branch once you've pushed any local changes

Tests are failing because create_keycloak_user. This method was removed from object manager since it was not used in the code. The sync was 1 way not 2 ways. If we need this method for tests, maybe we can create it there?

moritz89 commented 2 years ago

@diogosilva30 Do you have any changes in the pipeline? I'd push the rebased add CI tests branch once you've pushed any local changes

Tests are failing because create_keycloak_user. This method was removed from object manager since it was not used in the code. The sync was 1 way not 2 ways. If we need this method for tests, maybe we can create it there?

Jupp, my app only uses it for tests. I guess placing it in the KeycloakTestMixin isn't a bad idea

diogosilva30 commented 2 years ago

@diogosilva30 Do you have any changes in the pipeline? I'd push the rebased add CI tests branch once you've pushed any local changes

Tests are failing because create_keycloak_user. This method was removed from object manager since it was not used in the code. The sync was 1 way not 2 ways. If we need this method for tests, maybe we can create it there?

Jupp, my app only uses it for tests. I guess placing it in the KeycloakTestMixin isn't a bad idea

After CI is green, are we ready for MR?

moritz89 commented 2 years ago

@diogosilva30 Do you have any changes in the pipeline? I'd push the rebased add CI tests branch once you've pushed any local changes

Tests are failing because create_keycloak_user. This method was removed from object manager since it was not used in the code. The sync was 1 way not 2 ways. If we need this method for tests, maybe we can create it there?

Jupp, my app only uses it for tests. I guess placing it in the KeycloakTestMixin isn't a bad idea

After CI is green, are we ready for MR?

I'll check what happens when a user tries to use an expired token. As long as that doesn't return a 500 error, from my side, it should be good enough for an initial 2.0 release. Since users should use version ranges in the requirements.txt/Pipfiles, this shouldn't break anyone's setup.

moritz89 commented 2 years ago

I tested v2 with my app and the admin site, GraphQL queries/mutations as well as GraphQL subscriptions work. In a few minutes I'll be able to test the behavior of expired tokens.

diogosilva30 commented 2 years ago

Remove GraphQL references

Done in e72f410

moritz89 commented 2 years ago

Tested a few things and looks good from my side. Server returns a nice 403 on expired tokens. If you haven't already, I can write the create_keycloak_user method tomorrow. After that, I have no other comments for the MR.

diogosilva30 commented 2 years ago

Tested a few things and looks good from my side. Server returns a nice 403 on expired tokens. If you haven't already, I can write the create_keycloak_user method tomorrow. After that, I have no other comments for the MR.

Shouldn't you get a 401? According to djangorest docs, if you specify authenticate_header it returns a 401, if this method returns None, we get a 403. We did not specify it directly, but we override Django Rest TokenAuthentication which has:

def authenticate_header(self, request):
        return self.keyword

and our keyword is Bearer. So we should get a 401 Unauthorized.

moritz89 commented 2 years ago

@simao-silva You set the max-parallel jobs to 2. Is there a reason for that? Also should we add Python 3.11?

moritz89 commented 2 years ago

Tested a few things and looks good from my side. Server returns a nice 403 on expired tokens. If you haven't already, I can write the create_keycloak_user method tomorrow. After that, I have no other comments for the MR.

Shouldn't you get a 401? According to djangorest docs, if you specify authenticate_header it returns a 401, if this method returns None, we get a 403. We did not specify it directly, but we override Django Rest TokenAuthentication which has:

def authenticate_header(self, request):
        return self.keyword

and our keyword is Bearer. So we should get a 401 Unauthorized.

The 403 comes from my permission checking. Probably sees that the anonymous user has no permissions for the GraphQL query. Anyhow, Django itself will never raise 401 errors, only 403s. Reason is this: https://code.djangoproject.com/ticket/4354

simao-silva commented 2 years ago

@simao-silva You set the max-parallel jobs to 2. Is there a reason for that? Also should we add Python 3.11?

The reason was to debug why old versions of Keycloak were failing when the realm configuration file was added. We can remove that now. We should definitely add Python 3.11. I will do that.

diogosilva30 commented 2 years ago

Tested a few things and looks good from my side. Server returns a nice 403 on expired tokens. If you haven't already, I can write the create_keycloak_user method tomorrow. After that, I have no other comments for the MR.

Shouldn't you get a 401? According to djangorest docs, if you specify authenticate_header it returns a 401, if this method returns None, we get a 403. We did not specify it directly, but we override Django Rest TokenAuthentication which has:

def authenticate_header(self, request):
        return self.keyword

and our keyword is Bearer. So we should get a 401 Unauthorized.

The 403 comes from my permission checking. Probably sees that the anonymous user has no permissions for the GraphQL query. Anyhow, Django itself will never raise 401 errors, only 403s. Reason is this: https://code.djangoproject.com/ticket/4354

Rest framework returns 401 if you set authenticate_header. In your case our authenticate header is Bearer, so by default we must be returning 401.

moritz89 commented 2 years ago

Tested a few things and looks good from my side. Server returns a nice 403 on expired tokens. If you haven't already, I can write the create_keycloak_user method tomorrow. After that, I have no other comments for the MR.

Shouldn't you get a 401? According to djangorest docs, if you specify authenticate_header it returns a 401, if this method returns None, we get a 403. We did not specify it directly, but we override Django Rest TokenAuthentication which has:

def authenticate_header(self, request):
        return self.keyword

and our keyword is Bearer. So we should get a 401 Unauthorized.

The 403 comes from my permission checking. Probably sees that the anonymous user has no permissions for the GraphQL query. Anyhow, Django itself will never raise 401 errors, only 403s. Reason is this: https://code.djangoproject.com/ticket/4354

Rest framework returns 401 if you set authenticate_header. In your case our authenticate header is Bearer, so by default we must be returning 401.

Just to make sure we're on the same page, if the middleware can't authenticate the user, it does not do anything to the request (only does a debug log on errors). The 4XX error is solely dependent on the Django view handling the request, so a 401 will be sent if you're using DRF and a 403 if you raise a PermissionError using Django's permission framework. So both responses are valid.

diogosilva30 commented 2 years ago

Tested a few things and looks good from my side. Server returns a nice 403 on expired tokens. If you haven't already, I can write the create_keycloak_user method tomorrow. After that, I have no other comments for the MR.

Shouldn't you get a 401? According to djangorest docs, if you specify authenticate_header it returns a 401, if this method returns None, we get a 403. We did not specify it directly, but we override Django Rest TokenAuthentication which has:

def authenticate_header(self, request):
        return self.keyword

and our keyword is Bearer. So we should get a 401 Unauthorized.

The 403 comes from my permission checking. Probably sees that the anonymous user has no permissions for the GraphQL query. Anyhow, Django itself will never raise 401 errors, only 403s. Reason is this: https://code.djangoproject.com/ticket/4354

Rest framework returns 401 if you set authenticate_header. In your case our authenticate header is Bearer, so by default we must be returning 401.

Just to make sure we're on the same page, if the middleware can't authenticate the user, it does not do anything to the request (only does a debug log on errors). The 4XX error is solely dependent on the Django view handling the request, so a 401 will be sent if you're using DRF and a 403 if you raise a PermissionError using Django's permission framework. So both responses are valid.

Wasn't implying your response was invalid. Sorry for misunderstand. I was just wondering if 401 was returned when using restframework, as their docs state. I have no problem with Django's 403.

EDIT: @moritz89 Have you looked at the changes in 54ee49c? I had to switch the way we initialise the admin connector. It can't be instantiated by default when import a script. When using a project that has the app installed, but it is not currently using it (no keycloak server running), this would make the project crash trying to connect. By switching to lazy instance, we only initialize it when really need it (As we do currently before v2). I didn't think about the ramifications before.

moritz89 commented 2 years ago

@diogosilva30 Jupp, lazy loading is possible, but one side-effect of the current implementation is that it'll create quite a bit of overhead since a new admin connector has to be created with each KeycloakUser object instantiation. AFAIK, the KeycloakAdmin setup is somewhat expensive with the get_token() setup.

Would an alternative be a lazy loaded global object? A bit like here: https://stackoverflow.com/a/52359211/6783666

diogosilva30 commented 2 years ago

@diogosilva30 Jupp, lazy loading is possible, but one side-effect of the current implementation is that it'll create quite a bit of overhead since a new admin connector has to be created with each KeycloakUser object instantiation. AFAIK, the KeycloakAdmin setup is somewhat expensive with the get_token() setup.

Would an alternative be a lazy loaded global object? A bit like here: https://stackoverflow.com/a/52359211/6783666

For now I would ship v2 as it is. We know it is working. But we should explore your option for optimization in further patches.

moritz89 commented 2 years ago

@diogosilva30 Jupp, lazy loading is possible, but one side-effect of the current implementation is that it'll create quite a bit of overhead since a new admin connector has to be created with each KeycloakUser object instantiation. AFAIK, the KeycloakAdmin setup is somewhat expensive with the get_token() setup. Would an alternative be a lazy loaded global object? A bit like here: https://stackoverflow.com/a/52359211/6783666

For now I would ship v2 as it is. We know it is working. But we should explore your option for optimization in further patches.

Wasn't straight forward, but I added an implementation with lazy loading. Test with poetry run test_site/manage.py runserver without a Keycloak server and it started without complaining. Tests also work with the Keycloak server running.