jupyterhub / oauthenticator

OAuth + JupyterHub Authenticator = OAuthenticator
https://oauthenticator.readthedocs.io
BSD 3-Clause "New" or "Revised" License
408 stars 362 forks source link

[All] Make `username_claim` callable (except for CILogon), like it has been in Generic #717

Closed yuvipanda closed 7 months ago

yuvipanda commented 7 months ago

While trying to use Auth0 for authentication in one of our hubs, we discovered that the most useful username_claim (sub) produces usernames that look like oauth2|cilogon|http://cilogon.org/servera/users/43431 (when using auth0 with CILogon). The last part of sub is generally whatever is passed on to auth0, so it's going to be different for different users.

I had thought username_claim was a callable, but turns out that's only true for GenericOAuthenticator. I think it's pretty useful for every authenticator, so I've just moved that functionality out to the base class instead. I also added a test to verify it works. The test is in GenericOAuthenticator because it was the easiest place to put it, but it works across authenticators. This also means it is fully backwards compatible.

consideRatio commented 7 months ago

I'd like to hash out if we really prefer to make existing config username_claim callable as compared to something else, at least compared to the alternative of adding a new config to pick out the username part from a username_claim and deprecating Generic's username_claim as callable.

Below I'll make a case for adding new config instead of expanding use of username_claim as callable, but I'd like to start with providing a codebase overview.

Codebase overview

__The `username_claim` config__ 1. `OAuthenticator` defines `username_claim` where its name still directly aligns with its intent - to be the `claim` from where we derive the username. 2. `Generic` and only generic currently overrides `username_claim` to be callable - and when its callable its not meant to return a username claim anymore but the username itself. 3. `CILogon` overrides `username_claim` to be unusable, pointing users towards defining `username_claim` under `allowed_idps[].username_derivation` instead. __The `OAuthenticator.user_info_to_username(self, user_info)` function__ It picks the username based on config such as `username_claim`. `Generic` overrides this in order to respect its callable variable of `username_claim`. Both `user_info_to_username` and Generic's callable `userinfo_claim` gets passed the `user_info`, but only `user_info_to_username` is passed `self`. __The `Authenticator.normalize_username` function__ Defined in the `Authenticator` base class without being overridden, its called by `OAuthenticator.authenticate` to process the return value of `user_info_to_username`.

Why I hesitate on making username_claim as callable

Alternative change proposal 1 - regex dedicated config

Alternative change proposal 2 - a regex/callable variation

Like the change proposal above, but where username_regex is named username_picker or similar, that can be either a regex string or callable being passed the user info object.

With this approach, we could deprecate Generic's username_claim as callable with a traitlet validator that sets username_picker instead whenever username_claim as long as we stick with the function signature of passing just user_info.

consideRatio commented 7 months ago

@yuvipanda and other reviewers, could you try to rank how the strategies below based on what you think will be best?

  1. username_claim as string/callable not only by Generic
  2. username_regex as new string config
  3. username_picker as new string/callable config
manics commented 7 months ago

I'm in favour of minimising the number of configuration properties. How confident are you that username_regex works for all cases? Is username_claim always a string?

yuvipanda commented 7 months ago

Thanks for thinking about this, @consideRatio.

I generally think we should not be introducing regexes wherever possible (https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/). So while I understand the positives of it allowing YAML based config, I think overall introducing regexes to something as sensitive as username validation is not something we should do. Regexes are very very easy to get wrong, and very hard to debug, and escaping mismatches can cause security issues here. So I don't think we should be using regexes here.

I think the pattern of Union(<item>, Callable) where the Callable will return the same thing that could have been statically presented is very prevelant in JupyterHub ecosystem, and also extremely powerful in a clean way. The ability to have arbitrary python code is one of the core strengths of traitlets, and I love us leaning into it. This python code in the test for handling CILogon is actually pretty clean and readable, compared to what one would have to do with a regex. So the Union(<item>, Callable) is a pattern we should keep and promote.

I do understand your concern a bout the property itself being called username_claim, and not actually returning a claim key to be used but the username itself. The name seems fine to me, but I understand it's subjective.

So from https://github.com/jupyterhub/oauthenticator/pull/717#issuecomment-1898254573 (which is very helpful btw), I'd say we should just not do (1). My preference is for (0), but instead of (2) let me propose a different alternative.

Proposal #4

We have a callable called username_from_user_info that is just a callable. This exists in the base OAuthenticator (and can be ported to CILogon as well). When the admin sets this, username_claim will be ignored (but not forbidden, as the callable may use this setting). This would be a breaking change for GenericOAuthenticator, (username_claim will have to be string only) so it can be consistent. Let's call this option 4.

My preference would be still to just move the functionality as is from GenericOAuthenticator to OAuthenticator (what this PR does), for the following reasons:

  1. No backwards compatibility break anywhere. It's just new functionality. This is the strongest reason for me.
  2. I think the naming is intuitive enough, but I understand this is subjective.
  3. Fewer config options (as @manics said)

So my ordering would be your option (0) and then my proposed option (4).

consideRatio commented 7 months ago

Thanks for reasoning with me about options, i think option 4 leaves open questions related to having also a function named user_info_to_username.

Okay so going with 0 feels more okay to me now that this has been deliberated on a bit more.

Review feedback for this PR assuming continued path on option 0:

yuvipanda commented 7 months ago

Thanks @consideRatio. I've cleaned up the extra redefention in Generic here. I started working on CILogon, but realized we'll have to figure out how to make the jsonschema based validation we do there work, since it can't handle callables by default. I'm also not sure how to get the unit tests to pass properly. So I've split that out into as a separate PR https://github.com/jupyterhub/oauthenticator/pull/718. Do you think this PR can proceed, and we can deal with CILogon separately?

yuvipanda commented 7 months ago

Anything else I can do to get this merged? :)

consideRatio commented 7 months ago

Thank you @yuvipanda for taking time to reason about things!

I've opened #728 and updated the title to reflect that CILogon isn't providing a username_claim config that is callable still.