Closed pkalita-lbl closed 3 months ago
Very cool @pkalita-lbl ! Due to the scope of this change, I think a mini design review meeting which includes you @naglepuff and @marySalvi would be in order. You could walk through the code changes at a high level then say @naglepuff could afterward try it out locally and approve or suggest changes.
That's a great idea, and I'm happy to do that. I'll work with folks to find a time to do that soon.
Consider including orcid info (either in sub
or as an additional field) in the refresh/access token as a more transparent/universal user id.
For testers, PK suggested: Ensure an existing ?q=...
parameter would "survive" the login flow.
"The access token only lives in RAM; the refresh token is the thing that gets stored in browser storage."
cool. the runtime decodes a provided jwt access token via a stored secret (https://github.com/microbiomedata/nmdc-runtime/blob/v1.6.0/nmdc_runtime/api/models/user.py#L59), so if nmdc-server encodes an access token the same way (https://github.com/microbiomedata/nmdc-runtime/blob/v1.6.0/nmdc_runtime/api/core/auth.py#L89), then if the runtime has access to that secret, it could accept access tokens generated by nmdc-server as well.
Yes, this brings nmdc-server
and nmdc-runtime
into closer alignment. I think that maybe not right now but in the near future we could coordinate on a common JWT structure and shared secret key to potentially make the access tokens interoperable between the two systems.
Also now that the 2024.5 Release is on production I'm taking this out of draft.
Thanks for taking a look. I think I'd addressed your initial comments, but please feel free to unresolve any of those conversations if the answers/changes don't make sense.
In additional a couple of new comments, I did some functional testing. I covered:
@pkalita-lbl Everything looked good except when the refresh token expires (i.e. the /refresh
endpoint returns 401
), it doesn't actually look like I'm logged out until I refresh the page. Would it be a big lift to check the response from /refresh
for 401
and update the UI as expected? (Currently the username is still displayed at the top right, the download buttons appear to be functional [but they aren't, its just a visual thing], and the submission portal shows an empty list instead of a login button). Since these tokens stick around for so long, it could probably be done as a follow-up PR.
I did not test submission permissions, but these should definitely be retested in dev.
Thanks for the thorough testing!
Would it be a big lift to check the response from /refresh for 401 and update the UI as expected?
I can at least do an assessment of what would need to be done, and if it's straightforward I can add it here.
@naglepuff Check out the latest commit and see what you think. The place where we know that the token refresh has failed is deep in the bowels of the api.ts
module. At that point I have it emit a custom event on the window
object. The App
component listens to that event since it has access to the Vue router and state necessary to perform the "make the UI be logged out" actions.
I tested it out by setting
NMDC_API_JWT_EXPIRATION=60
NMDC_API_JWT_REFRESH_EXPIRATION=90
And after logging in, I waited a couple minutes before navigating to /submission/home
. Once all the API request attempts settled the UI reflected the logged-out state.
I don't know if there's a more Vue-ish way to handle the communication from api.ts
back up to the App
component, but maybe that works for now?
I'm marking this as a Draft PR for now because I don't want this going out with the 2024.5 Release, but it is fully ready for review and testing.Well well well here it is at long last. This is the complete replacement of session cookie-based authentication with Bearer token authentication.
Why?
Session cookie authentication works well for a backend working with a single frontend where both are served from the same origin. However as a project, we have outgrown that constraint since we now have:
nmdc-server
API endpoints that require authorization (e.g. the process that fetches submission data, transforms it, and inserts it into MongoDB)Backend Overview
In broad strokes these changes implement an "OAuth2-like" flow to provide a user with access and refresh tokens. I won't claim this it fully follows the OAuth2 specification (for example, we don't have a concept of assigning client IDs or client secrets), but it borrows ideas from it. Within our OAuth2-like process we hand off the actual user authentication to ORCID.
From start to finish the process is this:
{nmdc-server}/auth/login
and must provide aredirect_uri
query parameter.redirect_uri
matches an allowed list of origins. This can be configured by the environment. For example, the dev nmdc-server backend might be configured to only allow redirects tohttps://data-dev.microbiomedata.org
andhttps://fieldnotes.microbiomedata.org
.redirect_uri
, we store it in a session cookie. Yes, session cookies are still part of the process! But they are only an "internal" detail of this process and not used for actual authentication after the process completes.{nmdc-server}/auth/login
request is to redirect the client to the ORCID sign-in page. This requires providing ORCID a redirect URI that they can use after the user enters their credentials. We do not use theredirect_uri
provided in step 1 for this. The redirect URI provided to ORCID is always{nmdc-server}/auth/orcid-token
.{nmdc-server}/auth/orcid-token
with an authorization code.{nmdc-server}/auth/orcid-token
collects the ORCID authorization code from the query parameters and sends aPOST
request to{orcid}/oauth/token
with a body that includes (among other required things) the ORCID authorization code.POST
request with a JSON response that includes an ORCID access_token and some basic user information.{nmdc-server}/auth/orcid-token
uses that response to get or create a User object in the nmdc-server Postgres database based on the user's ORCID iD and name.redirect_uri
stashed away in step 3.{nmdc-server}/auth/orcid-token
completes by responding with a redirect toredirect_uri
originally provided in step 1, passing the nmdc-server authorization code generated in step 10.POST
request to{nmdc-server}/auth/token
with the nmdc-server authorization code and originalredirect_uri
.{nmdc-server}/auth/token
looks up the provided nmdc-server authorization code in Postgres.redirect_uri
that originally generated it and that it hasn't expired (the authorization code is valid for 5 minutes).{nmdc-server}/auth/token
completes by sending the tokens in a JSON response.Steps 4 through 7 constitute what ORCID refers to as "3 legged OAuth", but it is also commonly referred to generically as the OAuth Authorization Code Flow.
There is an second, alternative path to obtaining nmdc-server access and refresh tokens for clients that just happen to have an OpenID Connect (OIDC) ID Token issued by ORCID. A client can send a
POST
request with the ID Token to{nmdc-server}/auth/oidc-login
. The request handler will validate that the ID Token was issued by ORCID to a known audience (we currently accept only our own ORCID client ID, but we could expand that in the future) and that it has not expired. If validation succeeds, the ID token's claims are used to get or create a User object in thenmdc-server
Postgres database. Then access and refresh tokens are generated for the user and returned in a JSON response.The nmdc-server frontend will use the first mechanism (
/auth/login
followed by/auth/token
) because it allows the Authorization Code Flow with ORCID to use our ORCID Member client ID/secret and to request the/read-limited
scope. This could enable us, in the future, to do something after step 7 with the ORCID access_token that has elevated privileges.The alternative OIDC method, on the other hand, could be useful for external clients like NMDC EDGE which also work with ORCID as an authentication provider.
Frontend overview
Once the client has obtained access and refresh tokens it handles them by:
Authorization
header on API requestsThe access token has a relatively short expiration time. If a request fails with a 401 Unauthorized status, the client reads the refresh token from storage and sends a
POST /auth/refresh
request with it. If the refresh token is valid, a new access token is generated and sent as a response. The client can then retry the original request with the new access token.Since the access token is only stored in memory, when the page is fully reloaded there will never be an access token available. Therefore a refresh token exchange is initiated each time the main
App
component is mounted.In the event that multiple concurrent API requests fail with a 401 Unauthorized status, the function that handles making the refresh token exchange is memoized for up to 20 seconds so that only one actual refresh request goes to the server.
Future work
One thing that I know isn't great with these changes is the experience of someone trying to use endpoints via the Swagger interface that require authentication. Clicking the Authorize button in the Swagger interface now simply asks the user to provide a Bearer token, which wouldn't be easy for them to find. My suggestion would be to add a frontend view for logged-in users that allows them to copy their current access and/or refresh token. The access token could be pasted into the Swagger interface. The refresh token could be integrated into longer-standing automated scripts.
As currently implemented the token exchange in step 12 does relatively minimal checks that the client exchanging the the authorization code is the same client that initiated the process (step 1). It mainly relies on the short expiration time of the authorization code to mitigate leaked authorization codes. One possible enhancement to that step would be to implement something inspired by PKCE which would be a stronger verification of that the correct client is exchanging the code.
I know this is a huge change set, and I'm more than happy to answer any question or to even meet to go over anything in more detail. Just let me know!
cc: @shreddd @eecavanna