mitodl / ol-infrastructure

Infrastructure automation code for use by MIT Open Learning
BSD 3-Clause "New" or "Revised" License
46 stars 4 forks source link

Superset API token through Keycloak #2432

Closed shaidar closed 4 months ago

shaidar commented 5 months ago

Description/Context

We need to access the Superset API and thus require a way to generate a token with appropriate levels of permission.

Plan/Design

Given our Superset integration with Keycloak, we need to find a way to generate a token using the Keycloak endpoint that can be used to auth against the Superset API.

shaidar commented 5 months ago

I've started out by using the existing integration to obtain a token. In order to do that, I had to enable the service account roles on the superset client. curl -X POST \ 'https://sso-ci.ol.mit.edu/realms/ol-data-platform/protocol/openid-connect/token' \ -H 'Content-Type: application/x-www-form-urlencoded' \ -d 'grant_type=client_credentials' \ -d 'client_id=<client_id>' \ -d 'client_secret=<client_secret>

Using the newly obtained token, tried a GET request to access the Superset API: curl -X GET \ 'https://bi-ci.ol.mit.edu/api/v1/dashboard/' \ -H 'Authorization: Bearer <token>

This generated the following error: {"msg":"The specified alg value is not allowed"}

Decoding the token, it shows that the default Keycloak algo generated token is RS256. Going through the flask-app builder docs which is what Superset is based on, it relies on python Authlib to decode the token. I didn't came across anything that hinted that Authlib does not allow RS256. I accessed the superset docker container and using the python interpreter was able to decode the Keycloak generated token. Despite that, tried switching the algo in the realm settings in Keycloak to using HS256 to test things out. The error this time around was: {"msg":"Signature verification failed"}

After some additional digging, came across a discussion to add the following keys to superset_config.py:

With those values in place, I tried accessing the API and this time around, the error was: ERROR:superset.views.base:invalid literal for int() with base 10: '81236d7a-c521-4119-87b5-14da680e3c9c' Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/decorators.py", line 100, in wraps verify_jwt_in_request() File "/usr/local/lib/python3.10/site-packages/flask_jwt_extended/view_decorators.py", line 83, in verify_jwt_in_request _request_ctx_stack.top.jwt_user = _load_user(jwt_header, jwt_data) File "/usr/local/lib/python3.10/site-packages/flask_jwt_extended/view_decorators.py", line 141, in _load_user user = user_lookup(jwt_header, jwt_data) File "/usr/local/lib/python3.10/site-packages/flask_jwt_extended/internal_utils.py", line 25, in user_lookup return jwt_manager._user_lookup_callback(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/manager.py", line 2163, in load_user_jwt user = self.load_user(identity) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/manager.py", line 2157, in load_user user = self.get_user_by_id(int(pk)) ValueError: invalid literal for int() with base 10: '81236d7a-c521-4119-87b5-14da680e3c9c'

It appears that Superset is using the value of sub key in the token to call the get_user_by_id function which requires an int instead of the Keycloak generated value. So far I have been unable to change that to an int or force Superset to use a different key by adding a Keycloak mapping. Tried doing the following:

Links

blarghmatey commented 4 months ago

The current failure mode should be addressed by https://github.com/mitodl/ol-infrastructure/commit/0857304a26ea370d542a20e79f8e0b96dcd5b873 once it gets deployed

shaidar commented 4 months ago

Tested this again and ran into the same error as before. I did decode the JWT returned from Keycloak and verified that it does contain the preferred_username field.

shaidar commented 4 months ago

We were able to resolve this for now by creating a Keycloak user and assigning it to a Superset role with the required permissions. In addition we had to add the two Superset keys/values to the config mentioned above. Ideally we'd be able to use the logged in user session to obtain a Keycloak token, however the API call requires the user password and with Touchstone users, don't think that'll actually work. For now we're gonna settle on creating the Keycloak account and generating the token that way.

shaidar commented 4 months ago

Tested this on both QA and prod and it worked as expected. Closing issue.

xapepama commented 1 month ago

@shaidar, hello man! you can write examples of work requests? Thx

FGrobelny commented 3 weeks ago

@shaidar Thank's to your description i was able to configure api access in my superset instance with keycloak :) This is my setup if anyone would need to do the same

import json 
import urllib.request
from jwt.algorithms import RSAAlgorithm

JWT_ALGORITHM = "RS256"
JWT_DECODE_ALGORITHMS = ["RS256"]

# URL to the JWKS endpoint
jwks_url = "<keycloak>/protocol/openid-connect/certs"

def fetch_keycloak_rs256_public_cert():
  with urllib.request.urlopen(jwks_url) as response:  # noqa: S310
      jwks = json.load(response)

  public_key = RSAAlgorithm.from_jwk(
      json.dumps(jwks["keys"][0])
  )
  return public_key

JWT_PUBLIC_KEY = fetch_keycloak_rs256_public_cert()

and then in CUSTOM_SECURITY_MANAGER i overwrite flask function load_user_jwt to mitigate problem with sub by using id field wich in my case contains username. Then i get user using superset get_user_by_username function.

from flask import g
class CustomSsoSecurityManager(SupersetSecurityManager):
      <...>
      def load_user_jwt(self, _jwt_header, jwt_data):
          name = jwt_data.get("id", None)
          user = self.get_user_by_username(name)
          g.user = user
          return user
shaidar commented 3 weeks ago

@FGrobelny Does your config work for accessing the Superset security endpoint? So for example if you want to create role definitions through the API which would be accessing the security endpoint using the token generated by Keycloak, do you get a 403 error or no?

FGrobelny commented 3 weeks ago

I needed this to be able to obtain csrf token /api/v1/security/csrf_token and call /api/v1/chart/warm_up_cache and it works. With this setup you have access to api endpoints based on superset roles of user returned from load_user_jwt.

So let's say you have a superset-admin role in keycloak that maps through AUTH_ROLES_MAPPING to Admin role in Superset. And you use your access token obtained from keycloak openid-connect/token to query superset api - superset will see that you have an admin role and grant access based on that.

You can also enable keycloak service account and use that to perform some admin actions. Then you would need to map this service account to some account existing in superset

def load_user_jwt(self, _jwt_header, jwt_data):
...
name = jwt_data.get("id", None)
if name == "service-account-superset":
   name = "admin"
...