jupyterhub / oauthenticator

OAuth + JupyterHub Authenticator = OAuthenticator
https://oauthenticator.readthedocs.io
BSD 3-Clause "New" or "Revised" License
408 stars 362 forks source link

[CILogon] Can not use `orcid` as `username_claim` directly, an option is needed #712

Open yuvipanda opened 9 months ago

yuvipanda commented 9 months ago

I am trying to use ORCID with CILogonOAuthenticator, with the following config

          allowed_idps:
            http://orcid.org/oauth/authorize:
              username_derivation:
                username_claim: "oidc"
              allow_all: true

But unfortunately this produces usernames like https://orcid.org/<id>, which aren't valid (because of the / and :). So all logins fail. I tried various other username_claims but none of them actually just produce the user id.

I am currently using this additional claim:

          def setup_orcid_username(authenticator, handler, authentication):
            """
            Fish ORCID username from inside cilogon_user when used with ORCID

            There is no clear way to get just the ORCID id from CILogon, so we
            have to
            """
            print(authentication, flush=True)
            idp = authentication['auth_state']['cilogon_user']['idp']
            if idp == 'http://orcid.org/oauth/authorize':
              authentication['name'] = authentication['auth_state']['cilogon_user']['oidc'].split('/')[-1]
            return authentication

          c.Authenticator.post_auth_hook = get_orcid_username

And then using given_name as the username_claim. But it's never used, as we replace it with the split from oidc.

We should find some other way to extract such custom parts out of claims for username_claim. The easiest thing probably is to allow username_claim to be also a callable.

minrk commented 8 months ago

username_claim is callable in generic. Makes sense to match.

consideRatio commented 8 months ago

Hmmm, but username_claim should be the name of a claim if overridden right - to let it be something beyond this, such as the value of s claim breaks assumptions in the logic of other parts i think.

What do you think about these ideas?

minrk commented 8 months ago

normalize_username as a hook via configuration instead of subclass makes sense, though so does a more general username_hook that takes the full thing instead of just the username.

consideRatio commented 8 months ago

With "that takes the full thing", do you mean that it receives auth state etc and not just the username? I think that is a good thing for a hook.

I'm leaning towards username_hook atm, and making it be called after (?) username normalization i think.

If its called after, it could be passed both non-normalized and normalized username, but otherwise just the normalized username I'd say. If its called before, maybe it should be responsible for calling normalize_username itself?

It would be great to get detail on these aspects before a PR is opened, then it could be a very quick and straight forward PR.

minrk commented 8 months ago

I guess the question for CILogon is how much variety is there in what you might get vs. want to produce. You all have more experience in getting things from CILogon than I do.

For Generic, it makes sense that we want configuration to be able to produce arbitrary things, taking everything from the provider into account (post_auth_hook already is this, so maybe we don't need anything there). Does it make sense for CILogon to be as general as that? Or a simpler, more precise username_from_user_info(dict) -> str hook?

I think if normalize_username could have been a callable, maybe this wouldn't have needed an Issue?

consideRatio commented 8 months ago

I guess the question for CILogon is how much variety is there in what you might get vs. want to produce. You all have more experience in getting things from CILogon than I do.

I think CILogon can be seen as generic as Generic with regards to this - so functionality relevant for one is probably relevant for the other.

There is this function used by all OAuthenticators, user_info_to_username, and then there is normalize_username for all Authenticators

https://github.com/jupyterhub/oauthenticator/blob/da8bd36e1808f7ca7d5ee4151329056fc91a533c/oauthenticator/oauth2.py#L972-L974

I wonder if making them callable and configurable makes it easy to re-use the default implementation from the configured callable, and separately if we should do something in the OAuthenticstor class and/or the Authenticator class if we make normalize_username callable.

yuvipanda commented 8 months ago

Looking at the current description in generic, where username_claim is callable:

Can be a string key name or a callable that accepts the returned userdata json (as a dict) and returns the username. The callable is useful e.g. for extracting the username from a nested object in the response.

So it doesn't return the name of the key to be used, but the username itself.

I think CILogon can be seen as generic as Generic with regards to this - so functionality relevant for one is probably relevant for the other.

This is definitely true, and I'd like both of these to be consistent. So if we decide to do something different here in CILogon, Generic should probably be changed to match that.

Hence my recommendation is that we simply match the behavior of username_claim in GenericOAuthenticator in CILogon. I'm not opposed to deeper refactoring in Authenticator or elsewhere, but I don't a very easy or clear path there currently. And I'm pretty happy with the 'string or callable' convention we have with username_claim.