Use Bearer tokens for authentication

pkalita-lbl commented 4 months ago

~~I'm marking this as a Draft PR for now because I don't want this going out with the 2024.5 Release, but it is fully ready for review and testing.~~

Well well well here it is at long last. This is the complete replacement of session cookie-based authentication with Bearer token authentication.

Why?

Session cookie authentication works well for a backend working with a single frontend where both are served from the same origin. However as a project, we have outgrown that constraint since we now have:

Internal automated tasks that want to make requests to nmdc-server API endpoints that require authorization (e.g. the process that fetches submission data, transforms it, and inserts it into MongoDB)
The Field Notes app, which extensively uses metadata submission endpoints
NMDC EDGE is starting to think about how to integrate with the Submission Portal. It's not clear right now exactly how that will shake out, but I think it's safe to assume they will want to make API requests on behalf of a user.

Backend Overview

In broad strokes these changes implement an "OAuth2-like" flow to provide a user with access and refresh tokens. I won't claim this it fully follows the OAuth2 specification (for example, we don't have a concept of assigning client IDs or client secrets), but it borrows ideas from it. Within our OAuth2-like process we hand off the actual user authentication to ORCID.

From start to finish the process is this:

To initiate the authentication process a client navigates to {nmdc-server}/auth/login and must provide a redirect_uri query parameter.
The nmdc-server backend verifies that the provided redirect_uri matches an allowed list of origins. This can be configured by the environment. For example, the dev nmdc-server backend might be configured to only allow redirects to https://data-dev.microbiomedata.org and https://fieldnotes.microbiomedata.org.
After validating the redirect_uri, we store it in a session cookie. Yes, session cookies are still part of the process! But they are only an "internal" detail of this process and not used for actual authentication after the process completes.
The last step in handling the {nmdc-server}/auth/login request is to redirect the client to the ORCID sign-in page. This requires providing ORCID a redirect URI that they can use after the user enters their credentials. We do not use the redirect_uri provided in step 1 for this. The redirect URI provided to ORCID is always {nmdc-server}/auth/orcid-token.
Once the user enters their ORCID credentials, ORCID redirects back to {nmdc-server}/auth/orcid-token with an authorization code.
The request handler for {nmdc-server}/auth/orcid-token collects the ORCID authorization code from the query parameters and sends a POST request to {orcid}/oauth/token with a body that includes (among other required things) the ORCID authorization code.
ORCID responds to the POST request with a JSON response that includes an ORCID access_token and some basic user information.
The request handler for {nmdc-server}/auth/orcid-token uses that response to get or create a User object in the nmdc-server Postgres database based on the user's ORCID iD and name.
Then it retrieves the redirect_uri stashed away in step 3.
Then it generates an nmdc-server authorization code (stored in the Postgres database) which is tied to the user (step 8) and redirect_uri (step 9).
The request handler for {nmdc-server}/auth/orcid-token completes by responding with a redirect to redirect_uri originally provided in step 1, passing the nmdc-server authorization code generated in step 10.
The client must now exchange the nmdc-server authorization code for access tokens. It does this by sending a POST request to {nmdc-server}/auth/token with the nmdc-server authorization code and original redirect_uri.
The request handler for {nmdc-server}/auth/token looks up the provided nmdc-server authorization code in Postgres.
Then it validates that it is coming from the same redirect_uri that originally generated it and that it hasn't expired (the authorization code is valid for 5 minutes).
Then it generates nmdc-server access and refresh tokens for the user associated with the authorization code.
The request handler for {nmdc-server}/auth/token completes by sending the tokens in a JSON response.

sequenceDiagram
    client->>nmdc-server: 1. Navigate to /auth/login?redirect_uri=CLIENT_REDIRECT_URI
    Note over nmdc-server: 2. Validate CLIENT_REDIRECT_URI in allow list
    Note over nmdc-server: 3. Store CLIENT_REDIRECT_URI in session cookie
    nmdc-server->>ORCID: 4. Redirect to {orcid}/oauth/authorize?client_id=...&redirect_uri={nmdc-server}/auth/orcid-token
    ORCID->>nmdc-server: 5. Redirect to {nmdc-server}/auth/orcid-token?code=ORCID_AUTH_CODE
    nmdc-server->>ORCID: 6. POST {orcid}/oauth/token with ORCID_AUTH_CODE
    ORCID->>nmdc-server: 7. ORCID access_token, ORCID refresh_token, user details
    Note over nmdc-server: 8. Get or create User object from Postgres with user details
    Note over nmdc-server: 9. Retrieve CLIENT_REDIRECT_URI from session cookie
    Note over nmdc-server: 10. Generate nmdc-server authorization code tied to<br/>User and CLIENT_REDIRECT_URI
    nmdc-server->>client: 11. Redirect to CLIENT_REDIRECT_URI?code=NMDC_SERVER_AUTH_CODE
    client->>nmdc-server: 12. POST {nmdc-server}/auth/token with NMDC_SERVER_AUTH_CODE
    Note over nmdc-server: 13. Lookup NMDC_SERVER_AUTH_CODE in Postgres
    Note over nmdc-server: 14. Validate NMDC_SERVER_AUTH_CODE matches original<br/>CLIENT_REDIRECT_URI and hasn't expired
    Note over nmdc-server: 15. Generate nmdc-server access and refresh tokens
    nmdc-server->>client: 16. nmdc-server access_token, nmdc-server refresh_token

Steps 4 through 7 constitute what ORCID refers to as "3 legged OAuth", but it is also commonly referred to generically as the OAuth Authorization Code Flow.

There is an second, alternative path to obtaining nmdc-server access and refresh tokens for clients that just happen to have an OpenID Connect (OIDC) ID Token issued by ORCID. A client can send a POST request with the ID Token to {nmdc-server}/auth/oidc-login. The request handler will validate that the ID Token was issued by ORCID to a known audience (we currently accept only our own ORCID client ID, but we could expand that in the future) and that it has not expired. If validation succeeds, the ID token's claims are used to get or create a User object in the nmdc-server Postgres database. Then access and refresh tokens are generated for the user and returned in a JSON response.

The nmdc-server frontend will use the first mechanism (/auth/login followed by /auth/token) because it allows the Authorization Code Flow with ORCID to use our ORCID Member client ID/secret and to request the /read-limited scope. This could enable us, in the future, to do something after step 7 with the ORCID access_token that has elevated privileges.

The alternative OIDC method, on the other hand, could be useful for external clients like NMDC EDGE which also work with ORCID as an authentication provider.

Frontend overview

Once the client has obtained access and refresh tokens it handles them by:

Use the access token to set the default Authorization header on API requests
Store the refresh token in local storage

The access token has a relatively short expiration time. If a request fails with a 401 Unauthorized status, the client reads the refresh token from storage and sends a POST /auth/refresh request with it. If the refresh token is valid, a new access token is generated and sent as a response. The client can then retry the original request with the new access token.

Since the access token is only stored in memory, when the page is fully reloaded there will never be an access token available. Therefore a refresh token exchange is initiated each time the main App component is mounted.

In the event that multiple concurrent API requests fail with a 401 Unauthorized status, the function that handles making the refresh token exchange is memoized for up to 20 seconds so that only one actual refresh request goes to the server.

Future work

One thing that I know isn't great with these changes is the experience of someone trying to use endpoints via the Swagger interface that require authentication. Clicking the Authorize button in the Swagger interface now simply asks the user to provide a Bearer token, which wouldn't be easy for them to find. My suggestion would be to add a frontend view for logged-in users that allows them to copy their current access and/or refresh token. The access token could be pasted into the Swagger interface. The refresh token could be integrated into longer-standing automated scripts.

As currently implemented the token exchange in step 12 does relatively minimal checks that the client exchanging the the authorization code is the same client that initiated the process (step 1). It mainly relies on the short expiration time of the authorization code to mitigate leaked authorization codes. One possible enhancement to that step would be to implement something inspired by PKCE which would be a stronger verification of that the correct client is exchanging the code.

I know this is a huge change set, and I'm more than happy to answer any question or to even meet to go over anything in more detail. Just let me know!

cc: @shreddd @eecavanna

jeffbaumes commented 4 months ago

Very cool @pkalita-lbl ! Due to the scope of this change, I think a mini design review meeting which includes you @naglepuff and @marySalvi would be in order. You could walk through the code changes at a high level then say @naglepuff could afterward try it out locally and approve or suggest changes.

pkalita-lbl commented 4 months ago

That's a great idea, and I'm happy to do that. I'll work with folks to find a time to do that soon.

shreddd commented 4 months ago

Consider including orcid info (either in sub or as an additional field) in the refresh/access token as a more transparent/universal user id.

eecavanna commented 4 months ago

For testers, PK suggested: Ensure an existing ?q=... parameter would "survive" the login flow.

eecavanna commented 4 months ago

"The access token only lives in RAM; the refresh token is the thing that gets stored in browser storage."

dwinston commented 4 months ago

cool. the runtime decodes a provided jwt access token via a stored secret (https://github.com/microbiomedata/nmdc-runtime/blob/v1.6.0/nmdc_runtime/api/models/user.py#L59), so if nmdc-server encodes an access token the same way (https://github.com/microbiomedata/nmdc-runtime/blob/v1.6.0/nmdc_runtime/api/core/auth.py#L89), then if the runtime has access to that secret, it could accept access tokens generated by nmdc-server as well.

pkalita-lbl commented 4 months ago

Yes, this brings nmdc-server and nmdc-runtime into closer alignment. I think that maybe not right now but in the near future we could coordinate on a common JWT structure and shared secret key to potentially make the access tokens interoperable between the two systems.

Also now that the 2024.5 Release is on production I'm taking this out of draft.

pkalita-lbl commented 4 months ago

Thanks for taking a look. I think I'd addressed your initial comments, but please feel free to unresolve any of those conversations if the answers/changes don't make sense.

naglepuff commented 4 months ago

In additional a couple of new comments, I did some functional testing. I covered:

Basic login flow
Login from the "Bulk Download" control
Submission List (do I see the submissions I expect to see when logged in)
Submission View (do I have access to those submissions)
Users View (can admins still view this properly)
Single-file downloads (logged in and not logged in)
Bulk Downloads (logged in and not logged in)
Expiration of bulk downloads
Use of the query string to preserve filter state
Refresh auth token after it expires (tested by changing the settings to expire these tokens after 60 seconds)
What happens when the refresh token expires

@pkalita-lbl Everything looked good except when the refresh token expires (i.e. the /refresh endpoint returns 401), it doesn't actually look like I'm logged out until I refresh the page. Would it be a big lift to check the response from /refresh for 401 and update the UI as expected? (Currently the username is still displayed at the top right, the download buttons appear to be functional [but they aren't, its just a visual thing], and the submission portal shows an empty list instead of a login button). Since these tokens stick around for so long, it could probably be done as a follow-up PR.

I did not test submission permissions, but these should definitely be retested in dev.

pkalita-lbl commented 4 months ago

Thanks for the thorough testing!

Would it be a big lift to check the response from /refresh for 401 and update the UI as expected?

I can at least do an assessment of what would need to be done, and if it's straightforward I can add it here.

pkalita-lbl commented 4 months ago

@naglepuff Check out the latest commit and see what you think. The place where we know that the token refresh has failed is deep in the bowels of the api.ts module. At that point I have it emit a custom event on the window object. The App component listens to that event since it has access to the Vue router and state necessary to perform the "make the UI be logged out" actions.

I tested it out by setting

NMDC_API_JWT_EXPIRATION=60
NMDC_API_JWT_REFRESH_EXPIRATION=90

And after logging in, I waited a couple minutes before navigating to /submission/home. Once all the API request attempts settled the UI reflected the logged-out state.

I don't know if there's a more Vue-ish way to handle the communication from api.ts back up to the App component, but maybe that works for now?

microbiomedata / nmdc-server