lepture / authlib

The ultimate Python library in building OAuth, OpenID Connect clients and servers. JWS,JWE,JWK,JWA,JWT included.
https://authlib.org/
BSD 3-Clause "New" or "Revised" License
4.55k stars 452 forks source link

CSRF Warning! State not equal in request and response airflow -keycloak #441

Closed kurian-dm closed 1 year ago

kurian-dm commented 2 years ago

Describe the bug

It happens when using authlib to configure Keycloak for Airflow. Everything works perfectly up until redirecting back from Keycloak to Airflow.

Error Stacks

Something bad has happened.
Please consider letting us know by creating a [bug report using GitHub](https://github.com/apache/airflow/issues/new/choose).

Python version: 3.6.15
Airflow version: 2.1.4
Node: airflow-webserver-66fbff449c-wc8ht
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/airflow/.local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/airflow/.local/lib/python3.6/site-packages/flask_appbuilder/security/views.py", line 655, in oauth_authorized
    resp = self.appbuilder.sm.oauth_remotes[provider].authorize_access_token()
  File "/home/airflow/.local/lib/python3.6/site-packages/authlib/integrations/flask_client/apps.py", line 102, in authorize_access_token
    params = self._format_state_params(state_data, params)
  File "/home/airflow/.local/lib/python3.6/site-packages/authlib/integrations/base_client/sync_app.py", line 234, in _format_state_params
    raise MismatchingStateError()
authlib.integrations.base_client.errors.MismatchingStateError: mismatching_state: CSRF Warning! State not equal in request and response.

To Reproduce

A minimal example to reproduce the behavior: This is my code: import os import json import logging

  from flask import session 
  from airflow.www.security import AirflowSecurityManager
  from flask_appbuilder.security.manager import AUTH_OAUTH
  from flask import get_flashed_messages, request, redirect, flash
  from flask_appbuilder import expose
  from flask_appbuilder._compat import as_unicode
  from flask_appbuilder.security.views import AuthView
  from flask_login import login_user, logout_user
  from airflow import configuration as conf

  from airflow.www.security import AirflowSecurityManager
  # The SQLAlchemy connection string.
  SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

  #log = logging.getLogger(__name__)

  MY_PROVIDER = 'keycloak'
  CLIENT_ID = 'airflow'
  CLIENT_SECRET = 'LDKlkdqwkdowkdokwodok2'
  KEYCLOAK_BASE_URL = 'https://keyclk.xxx.io/auth/realms/Tata'
  KEYCLOAK_TOKEN_URL = 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/token'
  KEYCLOAK_AUTH_URL = 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/auth'
   #KEYCLOAK_API_URL = 'https://keyclk.xxx.io/auth/realms/Tata'
  KEYCLOAK_API_URL = 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/'

  AUTH_TYPE = AUTH_OAUTH
  AUTH_USER_REGISTRATION = True
  AUTH_USER_REGISTRATION_ROLE = "Public"
  AUTH_ROLES_SYNC_AT_LOGIN = True
  CSRF_ENABLED = True
  #PERMANENT_SESSION_LIFETIME = 1800

  # a mapping from the values of `userinfo["role_keys"]` to a list of FAB roles
  AUTH_ROLES_MAPPING = {
      "airflow_admin": ["Admin"],
      "airflow_op": ["Op"],
      "airflow_user": ["User"],
     "airflow_viewer": ["Viewer"],
     "airflow_public": ["Public"], 
  }

  OAUTH_PROVIDERS = [
    {
     'name': 'keycloak',
     'icon': 'fa-circle-o',
     'token_key': 'access_token',
     'remote_app': {
       'client_id': CLIENT_ID,
       'client_secret': CLIENT_SECRET,          
       'api_base_url': KEYCLOAK_BASE_URL,
       'response_type': 'code',
       'grant_type': 'authorization_code',
       'client_kwargs': {
         'scope': 'email profile openid roles'
       },
       'request_token_url': None,
       'access_token_url': KEYCLOAK_TOKEN_URL,
       'authorize_url': KEYCLOAK_AUTH_URL,
       'userinfo_endpoint': 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/userinfo',
       'logout_redirect_url': 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/logout'
      }
    }  
  ]

Expected behavior

Airflow redirects user to keycloak authentication site as expected. Upon finishing authenticating and getting redirected back to airflow, CSRF Warning! State not equal in request and response occur.

Environment:

Airflow runs on kubernetes cluster and keycloak runs on ECS fargate container within the same VPC in AWS.

Additional context

Tried on different browsers and in incognito mode, but it still does not work.

kurian-dm commented 2 years ago

Hi,

Any updates on this issues

Thanks

lepture commented 2 years ago

Can you create a runnable example? I'm not familiar with airflow.

kurian-dm commented 2 years ago

Hi,

You will need an AWS account. In that create a VPC. Install a EKS cluster and install Keycloak which runs on an ECS container. Airflow has to be installed on Kubernetes using helm charts.

If this is not possible for you, can we have a screen sharing session or I can even share additional logs. I have enabled FAB additional logging for airlfow. This shows that authorization is happening and then in authlib it detects a state mismatch.

Thanks

lepture commented 2 years ago

Hi, I've seen such an issue somewhere, it was caused by session not set properly. Can you check your session based on secure cookie? Just check if the server can get the session value, and if the browser contains those session data.

bradbase commented 2 years ago

I am reasonably sure this is a bug.

I am using Django as a client against a sever that's not Google, Twitter or Facebook and I'm getting CSRF session mismatch errors when calling authorize_access_token().

In my case this appears to be coming from framework.get_state_data() as it is looking for a key in this form f'_state_{self.name}_{state}' when request.session doesn't have a key in that form.

My guess is when using this library with one of a handful of known OAuth providers the key for the CSRF token in the request.session is in the form f'_state_{self.name}_{state}' and so things might be able to work.

But in a Django context when using OAuthlb as a client there is a bug.

If the CSRF token is in my request.session at all, it's going to be keyed as 'state'.

We can see in the code snippets below...

authorize_access_token sets the session key in params to 'state' then calls get_state_data passing request.session and the value of the session token.

get_state_data tries to recover a session token value using a key in the form f'_state_{self.name}_{state}'. In my case it will always return None.

Then authorize_access_token calls _format_state_params(state_data, params) where state_data is None and our MismatchingStateError is raised.

Apps.py

class DjangoOAuth2App(DjangoAppMixin, OAuth2Mixin, OpenIDMixin, BaseApp):
    client_cls = OAuth2Session

    def authorize_access_token(self, request, **kwargs):
        """Fetch access token in one step.

        :param request: HTTP request instance from Django view.
        :return: A token dict.
        """
        if request.method == 'GET':
            error = request.GET.get('error')
            if error:
                description = request.GET.get('error_description')
                raise OAuthError(error=error, description=description)
            params = {
                'code': request.GET.get('code'),
                'state': request.GET.get('state'),
            }
        else:
            params = {
                'code': request.POST.get('code'),
                'state': request.POST.get('state'),
            }

        state_data = self.framework.get_state_data(request.session, params.get('state'))
        self.framework.clear_state_data(request.session, params.get('state'))
        params = self._format_state_params(state_data, params)
        token = self.fetch_access_token(**params, **kwargs)

        if 'id_token' in token and 'nonce' in state_data:
            userinfo = self.parse_id_token(token, nonce=state_data['nonce'])
            token['userinfo'] = userinfo
        return token

framework_integration.py

    def get_state_data(self, session, state):
        key = f'_state_{self.name}_{state}'
        if self.cache:
            value = self._get_cache_data(key)
        else:
            value = session.get(key)
        if value:
            return value.get('data')
        return None

sync_app.py

    @staticmethod
    def _format_state_params(state_data, params):
        if state_data is None:
            raise MismatchingStateError()

        code_verifier = state_data.get('code_verifier')
        if code_verifier:
            params['code_verifier'] = code_verifier

        redirect_uri = state_data.get('redirect_uri')
        if redirect_uri:
            params['redirect_uri'] = redirect_uri
        return params

I have not put much time into thinking how to fix this, but the OAuth2 client documentation I've been reading suggests that the CSRF token is called 'state' so I'm not entirely sure why there's a munged key in the mix here at all.

I might be able to supply Django code if you're interested in replicating the error. But this bug has taken up a significant amount of time to characterise and find so I am now in a time crunch to make get things working.

bradbase commented 2 years ago

for later reference, I used the instructions here to trigger the above scenario.

https://docs.authlib.org/en/latest/client/django.html

lepture commented 2 years ago

@bradbase here is the demo for django: https://github.com/authlib/demo-oauth-client/tree/master/django-google-login

It works well

bradbase commented 2 years ago

@lepture Thank you.

Your example looks like it would work very well but it's optimised for logging against Google and I need to auth against Xero.

Xero has particular needs for its header and, as mentioned above, calls "state", "state". I have not seen a way to configure authlib finely enough to succeed.

Cheers

lepture commented 1 year ago

@bradbase state is added automatically. It is a part of the OAuth 2.0 logic.

@kurian-dm please make sure your session works. Same as https://github.com/lepture/authlib/issues/518

aleksarias commented 11 months ago

@lepture Thank you.

Your example looks like it would work very well but it's optimised for logging against Google and I need to auth against Xero.

Xero has particular needs for its header and, as mentioned above, calls "state", "state". I have not seen a way to configure authlib finely enough to succeed.

Cheers

Did you ever get this resolved?