mozilla / mozilla-django-oidc

A django OpenID Connect library
https://mozilla-django-oidc.readthedocs.io
Mozilla Public License 2.0
453 stars 169 forks source link

OIDC callback state not found in session `oidc_states`! #435

Open gh-ajenkins opened 3 years ago

gh-ajenkins commented 3 years ago

After logging into the mozilla/oidc-testprovider IDP, a SuspiciousOperation exception is thrown with message OIDC callback state not found in session `oidc_states`!. '

Python version: 3.8.10 Dependencies

asgiref==3.4.1
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
cryptography==3.4.8
Django==3.2.6
idna==3.2
josepy==1.8.0
mozilla-django-oidc==2.0.0
psycopg2==2.9.1
pycparser==2.20
pyOpenSSL==20.0.1
pytz==2021.1
requests==2.26.0
six==1.16.0
sqlparse==0.4.1
urllib3==1.26.6

Reproducing the issue:

settings.py

"""
Django settings for project project.

Generated by 'django-admin startproject' using Django 3.2.6.

For more information on this file, see
https://docs.djangoproject.com/en/3.2/topics/settings/

For the full list of settings and their values, see
https://docs.djangoproject.com/en/3.2/ref/settings/
"""

import os
from pathlib import Path

# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve().parent.parent

# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/3.2/howto/deployment/checklist/

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = '<SECRET_KEY>'

# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True

ALLOWED_HOSTS = [
    'testrp'
]

# Application definition

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'mozilla_django_oidc',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'app',
]

MIDDLEWARE = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
]

ROOT_URLCONF = 'project.urls'

# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/3.2/howto/static-files/

STATIC_URL = '/static/'
STATICFILES_DIRS = [
    BASE_DIR / 'static',
    ]

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': STATICFILES_DIRS,
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

WSGI_APPLICATION = 'project.wsgi.application'

# Database
# https://docs.djangoproject.com/en/3.2/ref/settings/#databases

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': os.environ['POSTGRES_DB'],
        'USER': os.environ['POSTGRES_USER'],
        'PASSWORD': os.environ['POSTGRES_PASSWORD'],
        'HOST': os.environ['POSTGRES_HOST'],
        'PORT': os.environ['POSTGRES_PORT'],
    }
}

AUTHENTICATION_BACKENDS = [
    'mozilla_django_oidc.auth.OIDCAuthenticationBackend',
]

OIDC_RP_CLIENT_ID = os.environ['OIDC_CLIENT_ID']
OIDC_RP_CLIENT_SECRET = os.environ['OIDC_CLIENT_SECRET']
OIDC_RP_SIGN_ALGO = os.environ['OIDC_SIGN_ALGO']
OIDC_OP_AUTHORIZATION_ENDPOINT = os.environ['OIDC_AUTH_ENDPOINT']
OIDC_OP_TOKEN_ENDPOINT = os.environ['OIDC_TOKEN_ENDPOINT']
OIDC_OP_USER_ENDPOINT = os.environ['OIDC_USER_ENDPOINT']
OIDC_RP_IDP_SIGN_KEY = os.environ['OIDC_IDP_SIGN_KEY']
OIDC_RP_SCOPES = 'openid email profile'
LOGIN_REDIRECT_URL = ""
LOGOUT_REDIRECT_URL = ""

# Password validation
# https://docs.djangoproject.com/en/3.2/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]

# Internationalization
# https://docs.djangoproject.com/en/3.2/topics/i18n/

LANGUAGE_CODE = 'en-us'

TIME_ZONE = 'UTC'

USE_I18N = True

USE_L10N = True

USE_TZ = True

# Default primary key field type
# https://docs.djangoproject.com/en/3.2/ref/settings/#default-auto-field

DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'

project urls.py

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('oidc/', include('mozilla_django_oidc.urls')),
    path('', include('app.urls')),
]

app urls.py

from django.urls import path
from django.contrib.auth import views as auth_views

from . import views

urlpatterns = [
    path('accounts/login/', auth_views.LoginView.as_view()),
    path('', views.index),
]

login.html template

<div>
{% if user.is_authenticated %}
 <p>Current user: ❴❴ user.email ❵❵</p>
 <form action="{% url 'oidc_logout' %}" method="post">
     {% csrf_token %}
     <input type="submit" value="Logout">
 </form>
 {% else %}
 <a href="{% url 'oidc_authentication_init' %}">Login</a>
 {% endif %}
</div>
naegelyd-insitro commented 3 years ago

I have recently seen this same issue. @gh-ajenkins Please let me know if you find any more details about the cause or if a fix is found.

AlbericTrancart commented 3 years ago

I am experiencing the same issue too. Seems to happen quite rarely (2 occurrences last month in our Sentry logs for ~20 users/day) Will report here if I have more details

akatsoulas commented 3 years ago

I am seeing this issue from time to time and I am trying to figure out under which conditions this is triggered. Please let me know if you have any more details about it

cfra commented 3 years ago

I believe to have found one way this can happen:

When a user sends two requests requiring authentication roughly at the same time, e.g. because they reopened their browser with old tabs, and the session refresh middleware is being used, both requests try to modify the session to add their state, however only as there is no locking, the changes of one of the two request are overwritten by the other request.

We see this issue on our site and added some more logging to track it down, and we see patterns like this from clients:

  1. GET /object/foo Adds state state_a and returns 302 redirecting to identity provider because oidc login is too old
  2. GET /object/bar Adds state state_b and returns 302 redirecting to identity provider because oidc login is too old
  3. GET /oidc/callback/?code=bla&state=state_a triggers "OIDC callback state not found"

According to our debug log, request 1 and request 2 each add their state to the session.

However, in the log output for request 3, we see that state_b is present in oidc_states, but state_a is missing.

Looking closer at request 2, we can see that state_a is missing in its oidc_states.

So I would assume that something like the following is happening:

  1. request 1 loads the session with oidc_states = {}
  2. request 2 loads the session with oidc_states = {}
  3. request 1 modifies and saves its local copy of the session, so oidc_states = { 'state_a': {...} } is in the backend
  4. request 2 modifies and saves its local copy of the session, so oidc_states = { 'state_b': {...} } is in the backend

While actually oidc_states = { 'state_a': {...}, 'state_b': {...} } should be in the backend.

I am not sure how this should best be solved, and I believe that the specifics might also depend on the Session Backend that is being used.

nikita-sunwind commented 3 years ago

@cfra I guess you are right. I've been investigating the same issue during some time. In our case we face issues when end users use Chrome browser on any operating system (Windows, MacOS and Linux). I guess, this may be connected to some Chrome performance optimization feature, because this problem is reproduced even without multiple or pinned browser tabs.

In all cases we see two concurrent requests coming from the same IP address and user agent in less than 100 ms. OIDC states in URL query parameters are different and IdP processes both requests correctly.

I believe that there is some kind of race condition when the library saves OIDC state into a Django session here: https://github.com/mozilla/mozilla-django-oidc/blob/master/mozilla_django_oidc/utils.py#L96 We use uWSGI application server with multiple worker processes, and concurrent requests are processed by different workers.

r3l4x0 commented 2 years ago

Not sure if this fixes the issue, but the def add_state_and_nonce_to_session function in utils.py does not call request.session.modified = True as described here https://docs.djangoproject.com/en/3.2/topics/http/sessions/#when-sessions-are-saved. This allows for multiple session states to be saved.

Sandro-Meireles commented 2 years ago

After a lot of research I was able to identify the source of the problem. I will try to briefly explain why the error occurs and how to solve it.

By default, the first endpoint called is /oidc/authenticate/ which redirects to a page that will be responsible for logging in and then redirects again to /oidc/callback/ sending some information.

We can divide this into 2 steps: 1 - Call the /oidc/authenticate/ endpoint by sending a payload containing a randomly generated state; 2 - Receive in /oidc/callback/ via redirect the authentication code and the state generated in the previous step.

In the first step, the library generates a very important value, the state and save it in the session. In the second step, the state value received after the redirect is checked, and if not found in the session, raises this error OIDC callback state not found in session oidc_states!. Reference: https://github.com/mozilla/mozilla-django-oidc/blob/master/mozilla_django_oidc/views.py#L88

This whole process happens very quickly. The problem is there, when there are too many simultaneous requests to /oidc/authenticate/, depending on which session engine you are using, the session cannot be saved in time before the second step.

I thought of some ways to solve this problem, and I will quote the simplest ones: Warning: These codes are mere examples, implement in a way that meets your needs.

  1. The less elegant way: Inherit the OIDCCallbackClass class and override the get method by adding a time.sleep(1) and configure the project to use your class instead of the default one. example/views.py

    import time
    
    from mozilla_django_oidc.views import OIDCCallbackClass
    
    class MyOIDCCallbackClass(OIDCCallbackClass):
        def get(self, request):
            time.sleep(1) 
            super().get(request)

    settings.py

    ...
    OIDC_CALLBACK_CLASS = "example.views.MyOIDCCallbackClass"
    ...
  2. Change the SESSION_ENGINE: By default django uses the database to save sessions, there are several positives to using it, but in our case, we need the session to be saved very quickly. I recommend that you read about it before making any changes: https://docs.djangoproject.com/en/4.0/topics/http/sessions/

    settings.py

    ...
    SESSION_ENGINE = "django.contrib.sessions.backends.cache"
    ...

    According to django's own documentation, this is the most performant engine, especially if you use Redis. This is the way I consider the most elegant and it solved my problems.

This is the summary of my studies, I may be wrong or incomplete, I would love to receive opinions. Hope this helps.

valcinei commented 2 years ago

You needs add class of authentication backend and orverride create_user method to add is_staff or is_superuser to True. Add AUTHENTICATION_BACKENDS in settings.py refering to you custom class backent Example:

# auth_backends.py

from mozilla_django_oidc.auth import OIDCAuthenticationBackend

class KeycloakOIDCAuthenticationBackend(OIDCAuthenticationBackend):

    def create_user(self, claims):
        """ Overrides Authentication Backend so that Django users are
            created with the keycloak preferred_username.
            If nothing found matching the email, then try the username.
        """
        user = super(KeycloakOIDCAuthenticationBackend, self).create_user(claims)
        user.first_name = claims.get('given_name', '')
        user.last_name = claims.get('family_name', '')
        user.email = claims.get('email')
        user.is_staff = True #Here fix that error
        user.username = claims.get('preferred_username')
        user.save()
        return user

    def filter_users_by_claims(self, claims):
        """ Return all users matching the specified email.
            If nothing found matching the email, then try the username
        """
        email = claims.get('email')

        if not email:
            return self.UserModel.objects.none()
        users = self.UserModel.objects.filter(email__iexact=email)
        return users

    def update_user(self, user, claims):
        user.first_name = claims.get('given_name', '')
        user.last_name = claims.get('family_name', '')
        user.email = claims.get('email')
        user.save()
        return user

In settings.py

#settings.py
AUTHENTICATION_BACKENDS = (
    'my_app.auth_backends.KeycloakOIDCAuthenticationBackend',
)
Lamberk commented 2 years ago

First of all, thank you for your effort, @valcinei

Could you please explain a little bit more about how exactly is_staff attribute is fixing the issue? :)

valcinei commented 2 years ago

@Lamberk in my case, I got error related to permission. User was only stored in keycloak. Schenario: 1 Login in /oicd/*** 2 Redirects to Keycloak auth login 3 Authenticate in Keycloak and redirects to django app 4 User not have permission then Django redirects to keycloak login again. 5 User already authenticated then redirect to admin again. 6 Recursive redirects then oidc_states Exception.

Adding is_staff or is_superuser and works for me.

pphysch commented 2 years ago

This issue came up in development and I realized it was due to my choice of email for the local Django superuser, which matched the OIDC user's email I was trying to login as. By default, this OIDC plugin identifies users by their email.

Changing the email on the local account, via the admin portal or database, fixed the issue in my case.

hedleyroos commented 2 years ago

If you run multiple Django instances (as you do in production) and use a cached session backend then you will receive this intermittently. This is because Django defaults to an in-memory cache backend, and requests are probably being round-robined between Django instances. Use the memcached cache backend and ensure all the Django instances are configured to use it.

ercpe commented 1 year ago

Any update on this issue? We see this exception quite regularly (a few times a day). We're somewhat reluctant to change our Session Backend as described in https://github.com/mozilla/mozilla-django-oidc/issues/435#issuecomment-1036372844; especially to a purely in-memory one.

After reading up on Django Session Management, the comment from @MrJizzler (https://github.com/mozilla/mozilla-django-oidc/issues/435#issuecomment-1025231256) seems totally reasonable. The very same problem is listed in the Django documentation (https://docs.djangoproject.com/en/3.2/topics/http/sessions/#when-sessions-are-saved): The implementation of add_state_and_nonce_to_session alters the oidc_states dict in the session if it is already present (e.g. to a previous/parallel login flow) and thus the session may not be stored.

pbx commented 1 year ago

I'm using mozilla-django-oidc at dpaste.com and started seeing this error last week (for the first time, that I recall, in about three years' use).

I recently (re) activated Google accounts as an OIDC registration option, which I'm guessing is related. I've seen this error arise three times, for three different users (all of whom were using Google accounts).

I'd been using 2.0.0; just upgraded to 3.0.0. (Note: RTD page still lists 2.0.0 as latest.)

Anyway, in case this bug still exists in 3.x, here's some trimmed and semi-obfuscated info from the email error notification.

Report at /oidc/callback/
OIDC callback state not found in session `oidc_states`!

Request Method: GET
Request URL: https://dpaste.com/oidc/callback/?state=$STATE&code=$CODE&scope=email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+openid&authuser=0&prompt=none
Django Version: 4.0.10
Python Version: 3.10.5

GET:
state = '$STATE'
code = '$CODE'
scope = 'email https://www.googleapis.com/auth/userinfo.email openid'
authuser = '0'
prompt = 'none'
Kapersyx commented 1 year ago

Also check that your oidc token contains all fields that Django is expecting, I forgot to fill in my email on a dev user and I kept getting this error.

pbx commented 1 year ago

FWIW I haven't seen this problem since upgrading to mozilla-django-oidc 3.x.

lyurchenko-skyhook commented 1 year ago

I do ocasionally see the issue, mozilla-django-oidc version 3.0.0. The issue is rare, but still getting the exception time to time. I'm using the default django session storage which I believe is the DB.

pbx commented 1 year ago

Saw this error again yesterday, so I guess mozilla-django-oidc 3.0.0 isn't the fix I thought it was. In case it's useful, here's the request URL from the failed request (state/code values obfuscated):

https://dpaste.com/oidc/callback/?state=STATE&code=CODE&scope=email+openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&authuser=0&prompt=none
erfaan commented 11 months ago

We were receiving few of these errors in Sentry. I looked at the logs and it appears that the users were calling the callback url again (a few seconds apart), with a new code and same state. Here is the example user journey:

00:00:00 /callback/?code=xyz&state=abc
00:00:10 /callback/?code=mno&state=abc

On first, invocation, the state is removed from the session. Second time, the SuspiciousOperation exception is raised.

I am still not sure how user is able to generate a new code and land again on the callback url. I tried reproducing the bug by pressing back and forward buttons of the browser but it reproduced the error with same code.

mmulich commented 8 months ago

In my case I've got 2+ simultaneous requests. These are hitting the SessionRefresh middleware and producing a refresh_url. I'm using the X-Requested-With: XMLHttpRequest header, but behavior should be the same for those that are not.

I know the requests I'm making are simultaneous, because React Strict Mode in development loads components twice, which in this case means two fetch calls are made within milliseconds of one another. As others have mentioned, the problem resides around the last oidc_state entry winning while the prior request(s) loses.

In order to debug the problem I added the following debug logging line to the end of SessionRefresh.process_request to see the adding and dropping of oidc_state items.

LOGGER.debug(f'oidc_states: {pformat(request.session["oidc_states"])}')

My attempts to find more information and a possible solution:


And to muse about a possible solution I'm not going to dig into...

Hypothetically, with a db backed session, one could make an OIDCState model a relation of django.contrib.sessions.models.Session. This of course requires a lot more maintenance effort to address the various session backends. But at least in the db backed session scenario it could possibly side stepping the concurrency conflict by letting the db add state separate from, but in relation to, the session.

indigane commented 4 days ago

Is there anything affected services can do to help diagnose this?

I have a service getting these errors at a rate of ~20k a month or upwards of 70% of logins. Other services have much lower rates. Most of them use Entra ID but the errors seem to exist regardless of identity provider and regardless of mozilla_django_oidc version. Some services have been running 2.x some 3.x. I have no data for 4.x yet.