microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
5 stars 3 forks source link

use ORCiD for authn #333

Closed dwinston closed 10 months ago

dwinston commented 11 months ago

This ticket consists of two tasks:

  1. Create a new API endpoint that calls ORCID's auth url and returns a Token

Options:

  • the most secure is a two-step process, using an authorization code to request a token
  • alternatively, we could use the implicit flow
  1. Add a new authentication flow to the FastAPI OAuth2 model that uses the ORCID access token

DOCS: fastapi.security.OAuthFlows.authorizationCode

dwinston commented 11 months ago

depends on #310

once authn+authz is simplified, ensure that use of ORCiD is supported (and preferred) for authentication.

PeopleMakeCulture commented 11 months ago

see:

PeopleMakeCulture commented 10 months ago

I made a sequence diagram to understand the current state

Image

https://www.mermaidchart.com/app/projects/5746f561-5ac5-409a-a4ea-ba6b13b6bba2/diagrams/bef04a9e-68fa-4b08-a111-ee69cabea9b0/version/v0.1/edit

PeopleMakeCulture commented 10 months ago

Here's a new sequence diagram with the ORCID flow added in.

Image

https://www.mermaidchart.com/app/projects/5746f561-5ac5-409a-a4ea-ba6b13b6bba2/diagrams/8359fed3-58be-42b3-a4e5-b3c72747bba0/version/v0.1/edit

PeopleMakeCulture commented 10 months ago

I'm blocked on the following:

PeopleMakeCulture commented 10 months ago

I was able to get ORCID's example auth code flow working using the following steps:

  1. Update the example code from https://github.com/ORCID/orcid-openid-examples/tree/master/js-implicit-popup
var return_page = "http%3A%2F%2F127.0.0.1%3A8081%2Forcid_popup.html"
url = "https://orcid.org/oauth/authorize?client_id=APP-NTSSHLDLXTCDNF4M&response_type=code&scope=/authenticate&redirect_uri=" + return_page

var authWindow;

//This opens a popup that hits the ORCID OAuth endpoint
function getORCID(){
  if ($("#orcid").val()){
    return;
  }
  authWindow = window.open(url, 'authWindow');
}
  1. Run a local server. Here's the HTML for the start_page
<div class="container">
      <h1 class="display-3">My website!</h1>
      <p>Please authenticate your ORCID ID so we can automatically register your name and associate your professional activities with your ORCID record</p>
      <p><a class="btn btn-primary btn-lg" href="#" role="button" onclick="getORCID()" id="auth_button">Authenticate your ORCID</a></p>
      <form>
          <label for="orcid">ORCID iD</label><input id="orcid" readonly class="form-control"></input>
          <label for="given_name">Given Name</label><input id="given_name" class="form-control"></input>
          <label for="family_name">Family Name</label><input id="family_name" class="form-control"></input>
      </form>
  1. Click "Authenticate Your ORCID". This will take you to this popup URL: https://orcid.org/oauth/authorize?client_id=APP-NTSSHLDLXTCDNF4M&response_type=code&scope=%2Fauthenticate&redirect_uri=http:%2F%2F127.0.0.1:8081%2Forcid_popup.html

Image

  1. Upon clicking "Authorize Access" the redirect URL will have a one-time-use code (eg http://127.0.0.1:8081/orcid_popup.html?code=X4fb9a)

  2. Use that code to run the following curl command:

curl -i -L -H "Accept: application/json" --data "client_id=APP-NTSSHLDLXTCDNF4M&client_secret=bd626861-e85c-4f1f-b59a-80f4cd22d22d&grant_type=authorization_code&code=X4fb9a&redirect_uri=http://127.0.0.1:8081/orcid_popup.html" "https://orcid.org/oauth/token"
  1. curl returns an access_token and orcid

Image

dwinston commented 10 months ago

I'm blocked on the following:

Not sure how the ORCID flow should interact w/ the User collection--assuming we don't want anyone with an ORCID to have access to all API endpoints. How should we handle creating Users in MDB associated with ORCIDs?

For now, make them disabled: https://github.com/microbiomedata/nmdc-runtime/blob/4561d854e31291fab40cd7377c52eb62986e0683/nmdc_runtime/api/models/user.py#L24

and set username as the orcid ID

Not sure get_current_user() can support multiple OAuth schemes. We may need to create a new function get_current_user_orcid(). If so, do we want get_current_user_orcid() to return a User from MDB?

idea for now is to add a third case, for supplying an orcid jwt, to the existing two cases (password and client_credentials) in login_for_access_token and OAuth2PasswordOrClientCredentialsRequestForm. So a new grant type, renaming the latter "OAuth2PasswordOrClientCredentialsOrOrcidJWTRequestForm" (I know, a monstrosity, but just to get this to a working state).

dwinston commented 10 months ago

actually, marking a user as disabled currently won't let them even see their own profile, as get_current_active_user is a guard for all authentication-required routes.

what I suggest we do here is replace all usages of get_current_active_user with a new get_permitted_user function that includes the logic of e.g.

if not permitted(user.username, "/metadata/changesheets:submit"):
    raise HTTPException(
        status_code=status.HTTP_403_FORBIDDEN,
        detail=(
            f"Only users {users_allowed('/metadata/changesheets:submit')} "
            "are allowed to apply changesheets at this time."
        ),
    )

and then add logic to ensure_default_api_perms to ensure users without orcids as their usernames are allowed to do the things they can do now just by being an authenticated user, and don't extend this capability to orcid-usernamed users. That should get things functional, and we can refactor from there.

dwinston commented 10 months ago

FYI I predict we will end up removing the client_credentials grant-type consideration for oauth flow when we "simply auth" as a separate issue.

PeopleMakeCulture commented 10 months ago

idea for now is to add a third case, for supplying an orcid jwt, to the existing two cases (password and client_credentials) in login_for_access_token and OAuth2PasswordOrClientCredentialsRequestForm. So a new grant type, renaming the latter "OAuth2PasswordOrClientCredentialsOrOrcidJWTRequestForm" (I know, a monstrosity, but just to get this to a working state).

I believe FastAPI's native OAuth2AuthorizationCodeBearer class is designed to handle both steps in the auth code flow (getting the code from the authorizationURL AND getting the token from the tokenUrl).

However OAuth2 does allow for an Authorization Code Grant where the user would supply their auth code NOT their JWT, to the API (in our case, enter their auth code into the Request Form).

To add this new grant type we would:

  1. Create a new API endpoint that returns an authorization_code from ORCID (see above)

  2. Update login_for_access_token in users.py

    elif form_data.grant_type == "authorization_code":

        # get token first, then authenticate user
        access_token , orcid = get_access_token_orcid(auth_code: auth_code)

        # ... skipping logic to add sub and expires_delta
        access_token = update_orcid_access_token(
            data={"sub": f"user:{user.username}"}, expires_delta=access_token_expires
        )

        user = authenticate_user_orcid(mdb, orcid)
        if not user:
            raise HTTPException(
                status_code=status.HTTP_401_UNAUTHORIZED,
                detail="Incorrect username or password",
                headers={"WWW-Authenticate": "Bearer"},
            )
  1. Add to auth.js
def authenticate_user_orcid(mdb, orcid: str):
    username = orcid
    user = get_user(mdb, username)
    if not user:
        return False
    return user

async def get_access_token_orcid(auth_code: auth_code): 
    # equivalent of curl -i -L -H "Accept: application/json" --data "client_id=APP-NTSSHLDLXTCDNF4M&client_secret=bd626861-e85c-4f1f-b59a-80f4cd22d22d&grant_type=authorization_code&code=X4fb9a&redirect_uri=http://127.0.0.1:8081/orcid_popup.html" "https://orcid.org/oauth/token"
    return (access_token, orcid)
  1. Add an authorization_code to OAuth2PasswordOrClientCredentialsOrOrcidAuthCodeRequestForm
    def __init__(
        self,
        basic_creds: Optional[HTTPBasicCredentials] = Depends(basic_credentials),
        grant_type: str = Form(None, regex="^password$|^client_credentials$|^authorization_code$"),
        authorization_code: Optional[str] = Form(None),

# ...

        if grant_type == "authorization_code" and authorization_code is None:
            raise HTTPException(
                status_code=HTTP_400_BAD_REQUEST,
                detail="grant_type password requires username and password",
            )

# ...

        self.authorization_code = authorization_code
  1. Maybe add an authCode flow to OAuth2PasswordOrClientCredentialsBearer class? This flow would still use our /token endpoint to call login_for_access_token NOT ORCID's "/token" endpoint. NOTE: I'm not sure how this works. Pls see blocker below.
  2.     flows = OAuthFlowsModel(
            password={"tokenUrl": tokenUrl, "scopes": scopes},
            clientCredentials={"tokenUrl": tokenUrl},
    
            authCode={"tokenUrl": tokenUrl}
        )

Blockers

I'm not sure if updating login_for_access_token would allow us to use a single oauth2_scheme for users who sign in with either ORCID or username/password

# auth.py
oauth2_scheme = OAuth2PasswordOrClientCredentialsBearer(tokenUrl="token")

# user.py
async def get_current_user(
    token: str = Depends(oauth2_scheme)
# ...
dwinston commented 10 months ago

reference discussion with fastapi maintainer's input: https://github.com/tiangolo/fastapi/discussions/9137#discussioncomment-5157382

dwinston commented 10 months ago

some hacking around: https://github.com/microbiomedata/nmdc-runtime/compare/333-playground?expand=1

dwinston commented 10 months ago

I was just thinking we could overload the current client_credentials flow to recognize an orcid jwt supplied as client_id and empty client_secret. So no new auth flow needed.

shreddd commented 10 months ago

Just catching up here but I'd like to offer some thoughts

  1. We should allow access via both ORCID jwt and client ID/key pairs
  2. The access tokens in the ORCID Oauth flow are for accessing ORCID API which is orthogonal to this effort - I don't think we'll need those?
  3. We should have a single user table under the hood that controls all access and ACLs. I'm thinking of something like this:
Username ORCID (if present) ACLs Client Key (if present) is_admin
myusername 000-323-32323 {coll1: "readwrite", coll2: "readonly" } None False
nersc-wf None {coll2: "readwrite"} dd3f38fy34fbu False

(This is a rough sketch - we can work out the details)

shreddd commented 10 months ago

Also - we can separate out the grants / auth code stuff etc. for now, if there is a simple way to get the ORCID token out of band.

shreddd commented 10 months ago

I think we should look at using the OpenID connect tokens for this, since it allows supports additional parameters like max-age and authentication date. https://github.com/ORCID/ORCID-Source/blob/main/orcid-web/ORCID_AUTH_WITH_OPENID_CONNECT.md

ssarrafan commented 10 months ago

This was discussed at infra sync and is active. Will move to next sprint.