nextauthjs / next-auth

Authentication for the Web.
https://authjs.dev
ISC License
23.87k stars 3.29k forks source link

Race condition with cookie altering requests #8897

Open ThomasF85 opened 10 months ago

ThomasF85 commented 10 months ago

Environment

System: OS: Linux 6.2 Ubuntu 22.04.3 LTS 22.04.3 LTS (Jammy Jellyfish) CPU: (32) x64 13th Gen Intel(R) Core(TM) i9-13900HX Memory: 25.58 GB / 31.08 GB Container: Yes Shell: 5.8.1 - /usr/bin/zsh Binaries: Node: 18.17.1 - ~/.nvm/versions/node/v18.17.1/bin/node npm: 9.6.7 - ~/.nvm/versions/node/v18.17.1/bin/npm Browsers: Chrome: 117.0.5938.62

Reproduction URL

https://github.com/ThomasF85/next-auth-race-condition-demo

Describe the issue

There is a race condition that can lead to obsolete session cookies being held in the browser. If you encrypt access or refresh tokens in the cookie this race condition can lead to holding outdated tokens in the cookie, which can lead to not having access to the requested resources, the refresh token being used multiple times which will be flagged in the auth server etc. .This race condition can generally occur with all cookie altering requests, like signin, signout, update, getSession.

Example: t = 0: The client fires a getSession request. t = 1: The client fires an update request that refreshes access and refresh token on the server t = 2: the client receives the response from the update request and receives a new cookie with encrypted tokens t = 3: The client receives the response from the getSession request, that took longer because of network issues with the serverless function handling the request. This request overwrites the session cookie, so that the client now has an outdated cookie with outdated access and refresh tokens.

Here is a different example that will lead to the same result: t = 0: The client fires an update request that refreshes access and refresh token on the server t = 1: The client fires a getSession request. t = 2: the client receives the response from the update request and receives a new cookie with encrypted tokens t = 3: The client receives the response from the getSession request. This request overwrites the session cookie, so that the client now has an outdated cookie with outdated access and refresh tokens.

How to reproduce

Please check this demo to reproduce the issue: https://github.com/ThomasF85/next-auth-race-condition-demo

  1. Clone the repository
  2. add an .env.local file with NEXTAUTH_SECRET environment variable
  3. run npm run dev
  4. login in Browser with credentials "Karl" and "password"
  5. Hit the two different buttons to see that one of them causes the race condition by calling getSession and update almost at the same time

Expected behavior

I am really not sure if that is the right solution from an API / Developer Experience / library responsibility point of view, but you don't want cookie altering requests to overwrite cookies from newer (meaning sent at a later point in time) cookie altering requests.

One way of fixing this is attaching the timestamp of the request to the cookie name. If the server finds multiple cookies it can then delete the ones with the lower timestamp and only keep the latest ones.

Another way of fixing this is orchestrating and synchronizing all cookie altering requests on the client (across all tabs).

Finally you can synchronize cookie altering requests server side, but that would require some additional piece of infrastructure to synchronize between functions.

In our dev team we came up with the solution that we intercept (through a middleware) cookie altering requests and have them not update the cookies unless it is a signout, signin or update call, because our app logic does not require the cookie updates (exp field update) being done by getSession calls.

anasch07 commented 9 months ago

Details: When multiple tabs are open, each tab independently attempts to refresh the session, resulting in simultaneous refreshAccessToken() calls with the same refresh_token. This leads to a race condition where the first request succeeds and the subsequent requests fail since the refresh_token used is no longer valid after the first refresh.

Scenario:

  1. 2 tabs open one after another. 2 timers are set 2 mins before the access token dies.
  2. 1st time fires first sending the refresh token to the server and brings back the a new refresh/access token. On the sever the sent refresh token is removed.

Proposed Solution:

Proposed Solution:

  1. Client-Side Locking: Introduce a client-side control system to manage the access to the session refresh process. This can be achieved by setting a flag in the local storage or a similar mechanism to ensure that only one tab initiates the refresh token process at any given time.
  2. Server-Side Lock with Redis: Utilize Redis or a similar store to implement a server-side distributed lock mechanism. This will act as a single source of truth to synchronize session refresh requests across multiple instances, preventing concurrent access to the refresh token.
  3. Backend-Driven Refresh: Alter the session management strategy to only initiate a token refresh in response to a 401 Unauthorized status code or a custom error message from the backend signaling that the access token has expired. This approach is reactive rather than proactive, reducing the frequency of refresh operations and avoiding the aforementioned race conditions.

Each of these solutions aims to mitigate the issue of simultaneous token refresh attempts across multiple tabs, enhancing the robustness of session management in the application

ThomasF85 commented 9 months ago

Regarding your proposed solutions:

  1. Client Side Locking: Using a lock in localstorage will work in a single tab scenario - although this could be done with a global variable as well. In a multiple tab scenario this will be unsafe because multiple tabs can read the lock and then write the lock concurrently. On top of that synchronization of localstorage between tabs might not be immediate. You can use the Web Locks API to make it work safely between multiple tabs though.
  2. Server-Side lock: It will work to put a lock in a different piece of infrastructure that is accessible from different instances of functions. Effectively you add another blocking call to every request you make and an extra piece of infrastructure. I would prefer a client side solution for those reasons.
  3. Backend driven refresh: This will fail every time the client makes concurrent requests that result in a 401 response - which can happen in a single tab scenario already. Multiple concurrent requests to your auth server with the same refresh token would still be made.