yuvipanda commented 10 months ago

While trying to use Auth0 for authentication in one of our hubs, we discovered that the most useful username_claim (sub) produces usernames that look like oauth2|cilogon|http://cilogon.org/servera/users/43431 (when using auth0 with CILogon). The last part of sub is generally whatever is passed on to auth0, so it's going to be different for different users.

I had thought username_claim was a callable, but turns out that's only true for GenericOAuthenticator. I think it's pretty useful for every authenticator, so I've just moved that functionality out to the base class instead. I also added a test to verify it works. The test is in GenericOAuthenticator because it was the easiest place to put it, but it works across authenticators. This also means it is fully backwards compatible.

consideRatio commented 10 months ago

I'd like to hash out if we really prefer to make existing config username_claim callable as compared to something else, at least compared to the alternative of adding a new config to pick out the username part from a username_claim and deprecating Generic's username_claim as callable.

Below I'll make a case for adding new config instead of expanding use of username_claim as callable, but I'd like to start with providing a codebase overview.

Codebase overview

__The `username_claim` config__ 1. `OAuthenticator` defines `username_claim` where its name still directly aligns with its intent - to be the `claim` from where we derive the username. 2. `Generic` and only generic currently overrides `username_claim` to be callable - and when its callable its not meant to return a username claim anymore but the username itself. 3. `CILogon` overrides `username_claim` to be unusable, pointing users towards defining `username_claim` under `allowed_idps[].username_derivation` instead. __The `OAuthenticator.user_info_to_username(self, user_info)` function__ It picks the username based on config such as `username_claim`. `Generic` overrides this in order to respect its callable variable of `username_claim`. Both `user_info_to_username` and Generic's callable `userinfo_claim` gets passed the `user_info`, but only `user_info_to_username` is passed `self`. __The `Authenticator.normalize_username` function__ Defined in the `Authenticator` base class without being overridden, its called by `OAuthenticator.authenticate` to process the return value of `user_info_to_username`.

Why I hesitate on making `username_claim` as callable

Unexpected return value Generic's username_claim as callable returns something different than a username claim, but the username itself. This is a confusion point and adds complexity for users and maintainers but also for people inspecting config they haven't read up on yet, like a new JupyterHub admin taking over an existing deployment.
Callable config and YAML representation Callable config can only be parsed from Python code config, while some deployments provide most authentication config via YAML passthrough config (z2jh's hub.config). While its possible to put this part in hub.extraConfig where Python code strings can be put, it forces config related to auth to be separated which makes the it harder to understand and maintain.

Alternative change proposal 1 - regex dedicated config

We deprecate Generic username_claim as a callable
We add username_regex as a string for OAuthenticator (and re-implements it in CILogon under allowed_idps.username_derivation), where if its set it will extract the username relevant part specifically.

Alternative change proposal 2 - a regex/callable variation

Like the change proposal above, but where username_regex is named username_picker or similar, that can be either a regex string or callable being passed the user info object.

With this approach, we could deprecate Generic's username_claim as callable with a traitlet validator that sets username_picker instead whenever username_claim as long as we stick with the function signature of passing just user_info.

consideRatio commented 10 months ago

@yuvipanda and other reviewers, could you try to rank how the strategies below based on what you think will be best?

username_claim as string/callable not only by Generic
username_regex as new string config
username_picker as new string/callable config

manics commented 10 months ago

I'm in favour of minimising the number of configuration properties. How confident are you that username_regex works for all cases? Is username_claim always a string?

yuvipanda commented 10 months ago

Thanks for thinking about this, @consideRatio.

I generally think we should not be introducing regexes wherever possible (https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/). So while I understand the positives of it allowing YAML based config, I think overall introducing regexes to something as sensitive as username validation is not something we should do. Regexes are very very easy to get wrong, and very hard to debug, and escaping mismatches can cause security issues here. So I don't think we should be using regexes here.

I think the pattern of Union(<item>, Callable) where the Callable will return the same thing that could have been statically presented is very prevelant in JupyterHub ecosystem, and also extremely powerful in a clean way. The ability to have arbitrary python code is one of the core strengths of traitlets, and I love us leaning into it. This python code in the test for handling CILogon is actually pretty clean and readable, compared to what one would have to do with a regex. So the Union(<item>, Callable) is a pattern we should keep and promote.

I do understand your concern a bout the property itself being called username_claim, and not actually returning a claim key to be used but the username itself. The name seems fine to me, but I understand it's subjective.

So from https://github.com/jupyterhub/oauthenticator/pull/717#issuecomment-1898254573 (which is very helpful btw), I'd say we should just not do (1). My preference is for (0), but instead of (2) let me propose a different alternative.

Proposal #4

We have a callable called username_from_user_info that is just a callable. This exists in the base OAuthenticator (and can be ported to CILogon as well). When the admin sets this, username_claim will be ignored (but not forbidden, as the callable may use this setting). This would be a breaking change for GenericOAuthenticator, (username_claim will have to be string only) so it can be consistent. Let's call this option 4.

My preference would be still to just move the functionality as is from GenericOAuthenticator to OAuthenticator (what this PR does), for the following reasons:

No backwards compatibility break anywhere. It's just new functionality. This is the strongest reason for me.
I think the naming is intuitive enough, but I understand this is subjective.
Fewer config options (as @manics said)

So my ordering would be your option (0) and then my proposed option (4).

consideRatio commented 10 months ago

Thanks for reasoning with me about options, i think option 4 leaves open questions related to having also a function named user_info_to_username.

Okay so going with 0 feels more okay to me now that this has been deliberated on a bit more.

Review feedback for this PR assuming continued path on option 0:

i'd like to see this implemented for cilogonoauthenticator as well, which makes use of username_claim under idp specific config (allowed_idps[idp].username_derivation.username_claim) (closes #712)
cleanup the genericoauthenticator re-definition of username_claim trait that is no longer needed now

yuvipanda commented 10 months ago

Thanks @consideRatio. I've cleaned up the extra redefention in Generic here. I started working on CILogon, but realized we'll have to figure out how to make the jsonschema based validation we do there work, since it can't handle callables by default. I'm also not sure how to get the unit tests to pass properly. So I've split that out into as a separate PR https://github.com/jupyterhub/oauthenticator/pull/718. Do you think this PR can proceed, and we can deal with CILogon separately?

yuvipanda commented 10 months ago

Anything else I can do to get this merged? :)

consideRatio commented 9 months ago

Thank you @yuvipanda for taking time to reason about things!

I've opened #728 and updated the title to reflect that CILogon isn't providing a username_claim config that is callable still.

jupyterhub / oauthenticator

[All] Make `username_claim` callable (except for CILogon), like it has been in Generic #717

Codebase overview

Why I hesitate on making `username_claim` as callable

Alternative change proposal 1 - regex dedicated config

Alternative change proposal 2 - a regex/callable variation

Proposal #4

jupyterhub / oauthenticator

[All] Make `username_claim` callable (except for CILogon), like it has been in Generic #717

Codebase overview

Why I hesitate on making username_claim as callable

Alternative change proposal 1 - regex dedicated config

Alternative change proposal 2 - a regex/callable variation

Proposal #4

Why I hesitate on making `username_claim` as callable