Open gh-ajenkins opened 3 years ago
I have recently seen this same issue. @gh-ajenkins Please let me know if you find any more details about the cause or if a fix is found.
I am experiencing the same issue too. Seems to happen quite rarely (2 occurrences last month in our Sentry logs for ~20 users/day) Will report here if I have more details
I am seeing this issue from time to time and I am trying to figure out under which conditions this is triggered. Please let me know if you have any more details about it
I believe to have found one way this can happen:
When a user sends two requests requiring authentication roughly at the same time, e.g. because they reopened their browser with old tabs, and the session refresh middleware is being used, both requests try to modify the session to add their state, however only as there is no locking, the changes of one of the two request are overwritten by the other request.
We see this issue on our site and added some more logging to track it down, and we see patterns like this from clients:
/object/foo
Adds state state_a
and returns 302 redirecting to identity provider because oidc login is too old/object/bar
Adds state state_b
and returns 302 redirecting to identity provider because oidc login is too old/oidc/callback/?code=bla&state=state_a
triggers "OIDC callback state not found"According to our debug log, request 1 and request 2 each add their state to the session.
However, in the log output for request 3, we see that state_b
is present in oidc_states
, but state_a
is missing.
Looking closer at request 2, we can see that state_a
is missing in its oidc_states
.
So I would assume that something like the following is happening:
oidc_states = {}
oidc_states = {}
oidc_states = { 'state_a': {...} }
is in the backendoidc_states = { 'state_b': {...} }
is in the backendWhile actually oidc_states = { 'state_a': {...}, 'state_b': {...} }
should be in the backend.
I am not sure how this should best be solved, and I believe that the specifics might also depend on the Session Backend that is being used.
@cfra I guess you are right. I've been investigating the same issue during some time. In our case we face issues when end users use Chrome browser on any operating system (Windows, MacOS and Linux). I guess, this may be connected to some Chrome performance optimization feature, because this problem is reproduced even without multiple or pinned browser tabs.
In all cases we see two concurrent requests coming from the same IP address and user agent in less than 100 ms. OIDC states in URL query parameters are different and IdP processes both requests correctly.
I believe that there is some kind of race condition when the library saves OIDC state into a Django session here: https://github.com/mozilla/mozilla-django-oidc/blob/master/mozilla_django_oidc/utils.py#L96 We use uWSGI application server with multiple worker processes, and concurrent requests are processed by different workers.
Not sure if this fixes the issue, but the def add_state_and_nonce_to_session
function in utils.py
does not call request.session.modified = True
as described here https://docs.djangoproject.com/en/3.2/topics/http/sessions/#when-sessions-are-saved. This allows for multiple session states to be saved.
After a lot of research I was able to identify the source of the problem. I will try to briefly explain why the error occurs and how to solve it.
By default, the first endpoint called is /oidc/authenticate/
which redirects to a page that will be responsible for logging in and then redirects again to /oidc/callback/
sending some information.
We can divide this into 2 steps:
1 - Call the /oidc/authenticate/
endpoint by sending a payload containing a randomly generated state
;
2 - Receive in /oidc/callback/
via redirect the authentication code and the state
generated in the previous step.
In the first step, the library generates a very important value, the state
and save it in the session.
In the second step, the state
value received after the redirect is checked, and if not found in the session,
raises this error OIDC callback state not found in session oidc_states!
. Reference: https://github.com/mozilla/mozilla-django-oidc/blob/master/mozilla_django_oidc/views.py#L88
This whole process happens very quickly. The problem is there, when there are too many simultaneous requests to /oidc/authenticate/
, depending on which session engine you are using, the session cannot be saved in time before the second step.
I thought of some ways to solve this problem, and I will quote the simplest ones: Warning: These codes are mere examples, implement in a way that meets your needs.
The less elegant way:
Inherit the OIDCCallbackClass
class and override the get
method by adding a time.sleep(1)
and configure the project to use your class instead of the default one.
example/views.py
import time
from mozilla_django_oidc.views import OIDCCallbackClass
class MyOIDCCallbackClass(OIDCCallbackClass):
def get(self, request):
time.sleep(1)
super().get(request)
settings.py
...
OIDC_CALLBACK_CLASS = "example.views.MyOIDCCallbackClass"
...
Change the SESSION_ENGINE: By default django uses the database to save sessions, there are several positives to using it, but in our case, we need the session to be saved very quickly. I recommend that you read about it before making any changes: https://docs.djangoproject.com/en/4.0/topics/http/sessions/
settings.py
...
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
...
According to django's own documentation, this is the most performant engine, especially if you use Redis. This is the way I consider the most elegant and it solved my problems.
This is the summary of my studies, I may be wrong or incomplete, I would love to receive opinions. Hope this helps.
You needs add class of authentication backend and orverride create_user method to add is_staff
or is_superuser
to True
.
Add AUTHENTICATION_BACKENDS
in settings.py refering to you custom class backent
Example:
# auth_backends.py
from mozilla_django_oidc.auth import OIDCAuthenticationBackend
class KeycloakOIDCAuthenticationBackend(OIDCAuthenticationBackend):
def create_user(self, claims):
""" Overrides Authentication Backend so that Django users are
created with the keycloak preferred_username.
If nothing found matching the email, then try the username.
"""
user = super(KeycloakOIDCAuthenticationBackend, self).create_user(claims)
user.first_name = claims.get('given_name', '')
user.last_name = claims.get('family_name', '')
user.email = claims.get('email')
user.is_staff = True #Here fix that error
user.username = claims.get('preferred_username')
user.save()
return user
def filter_users_by_claims(self, claims):
""" Return all users matching the specified email.
If nothing found matching the email, then try the username
"""
email = claims.get('email')
if not email:
return self.UserModel.objects.none()
users = self.UserModel.objects.filter(email__iexact=email)
return users
def update_user(self, user, claims):
user.first_name = claims.get('given_name', '')
user.last_name = claims.get('family_name', '')
user.email = claims.get('email')
user.save()
return user
In settings.py
#settings.py
AUTHENTICATION_BACKENDS = (
'my_app.auth_backends.KeycloakOIDCAuthenticationBackend',
)
First of all, thank you for your effort, @valcinei
Could you please explain a little bit more about how exactly is_staff attribute is fixing the issue? :)
@Lamberk in my case, I got error related to permission.
User was only stored in keycloak.
Schenario:
1 Login in /oicd/***
2 Redirects to Keycloak auth login
3 Authenticate in Keycloak and redirects to django app
4 User not have permission then Django redirects to keycloak login again.
5 User already authenticated then redirect to admin again.
6 Recursive redirects then oidc_states
Exception.
Adding is_staff
or is_superuser
and works for me.
This issue came up in development and I realized it was due to my choice of email for the local Django superuser, which matched the OIDC user's email I was trying to login as. By default, this OIDC plugin identifies users by their email.
Changing the email on the local account, via the admin portal or database, fixed the issue in my case.
If you run multiple Django instances (as you do in production) and use a cached session backend then you will receive this intermittently. This is because Django defaults to an in-memory cache backend, and requests are probably being round-robined between Django instances. Use the memcached cache backend and ensure all the Django instances are configured to use it.
Any update on this issue? We see this exception quite regularly (a few times a day). We're somewhat reluctant to change our Session Backend as described in https://github.com/mozilla/mozilla-django-oidc/issues/435#issuecomment-1036372844; especially to a purely in-memory one.
After reading up on Django Session Management, the comment from @MrJizzler (https://github.com/mozilla/mozilla-django-oidc/issues/435#issuecomment-1025231256) seems totally reasonable. The very same problem is listed in the Django documentation (https://docs.djangoproject.com/en/3.2/topics/http/sessions/#when-sessions-are-saved):
The implementation of add_state_and_nonce_to_session
alters the oidc_states
dict in the session if it is already present (e.g. to a previous/parallel login flow) and thus the session may not be stored.
I'm using mozilla-django-oidc at dpaste.com and started seeing this error last week (for the first time, that I recall, in about three years' use).
I recently (re) activated Google accounts as an OIDC registration option, which I'm guessing is related. I've seen this error arise three times, for three different users (all of whom were using Google accounts).
I'd been using 2.0.0; just upgraded to 3.0.0. (Note: RTD page still lists 2.0.0 as latest.)
Anyway, in case this bug still exists in 3.x, here's some trimmed and semi-obfuscated info from the email error notification.
Report at /oidc/callback/
OIDC callback state not found in session `oidc_states`!
Request Method: GET
Request URL: https://dpaste.com/oidc/callback/?state=$STATE&code=$CODE&scope=email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+openid&authuser=0&prompt=none
Django Version: 4.0.10
Python Version: 3.10.5
GET:
state = '$STATE'
code = '$CODE'
scope = 'email https://www.googleapis.com/auth/userinfo.email openid'
authuser = '0'
prompt = 'none'
Also check that your oidc token contains all fields that Django is expecting, I forgot to fill in my email on a dev user and I kept getting this error.
FWIW I haven't seen this problem since upgrading to mozilla-django-oidc 3.x.
I do ocasionally see the issue, mozilla-django-oidc version 3.0.0. The issue is rare, but still getting the exception time to time. I'm using the default django session storage which I believe is the DB.
Saw this error again yesterday, so I guess mozilla-django-oidc 3.0.0 isn't the fix I thought it was. In case it's useful, here's the request URL from the failed request (state/code values obfuscated):
https://dpaste.com/oidc/callback/?state=STATE&code=CODE&scope=email+openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&authuser=0&prompt=none
We were receiving few of these errors in Sentry. I looked at the logs and it appears that the users were calling the callback url again (a few seconds apart), with a new code
and same state
. Here is the example user journey:
00:00:00 /callback/?code=xyz&state=abc
00:00:10 /callback/?code=mno&state=abc
On first, invocation, the state is removed from the session. Second time, the SuspiciousOperation
exception is raised.
I am still not sure how user is able to generate a new code and land again on the callback url. I tried reproducing the bug by pressing back and forward buttons of the browser but it reproduced the error with same code.
In my case I've got 2+ simultaneous requests. These are hitting the SessionRefresh
middleware and producing a refresh_url
. I'm using the X-Requested-With: XMLHttpRequest
header, but behavior should be the same for those that are not.
I know the requests I'm making are simultaneous, because React Strict Mode in development loads components twice, which in this case means two fetch calls are made within milliseconds of one another. As others have mentioned, the problem resides around the last oidc_state
entry winning while the prior request(s) loses.
In order to debug the problem I added the following debug logging line to the end of SessionRefresh.process_request
to see the adding and dropping of oidc_state items.
LOGGER.debug(f'oidc_states: {pformat(request.session["oidc_states"])}')
My attempts to find more information and a possible solution:
Adding request.session.modified = True
did not make any difference. The added state was still dropped. This was a shot in the dark, because django.contrib.sessions.backends.base.SessionBase
has accessor methods that set modified = True
for us.
Adding memcached as the SESSION_ENGINE showed no improvement. This also dropped the newly added state.
Furthermore, using request.session.save(must_create=True)
at the end of SessionRefresh.process_request
raised CreateError
while handling an IntegrityError
. This pretty much confirms that under most circumstances the machinery swallows the exception; thus dropping the updates.
And to muse about a possible solution I'm not going to dig into...
Hypothetically, with a db backed session, one could make an OIDCState model a relation of django.contrib.sessions.models.Session
. This of course requires a lot more maintenance effort to address the various session backends. But at least in the db backed session scenario it could possibly side stepping the concurrency conflict by letting the db add state separate from, but in relation to, the session.
Is there anything affected services can do to help diagnose this?
I have a service getting these errors at a rate of ~20k a month or upwards of 70% of logins. Other services have much lower rates. Most of them use Entra ID but the errors seem to exist regardless of identity provider and regardless of mozilla_django_oidc
version. Some services have been running 2.x
some 3.x
. I have no data for 4.x
yet.
After logging into the mozilla/oidc-testprovider IDP, a
SuspiciousOperation
exception is thrown with messageOIDC callback state not found in session `oidc_states`!
. 'Python version: 3.8.10 Dependencies
Reproducing the issue:
testprovider
service locallysettings.py
project urls.py
app urls.py
login.html template