Closed yuvipanda closed 9 months ago
I'd like to hash out if we really prefer to make existing config username_claim
callable as compared to something else, at least compared to the alternative of adding a new config to pick out the username part from a username_claim
and deprecating Generic's username_claim
as callable.
Below I'll make a case for adding new config instead of expanding use of username_claim
as callable, but I'd like to start with providing a codebase overview.
username_claim
as callableusername_claim
as callable returns something different than a username claim, but the username itself. This is a confusion point and adds complexity for users and maintainers but also for people inspecting config they haven't read up on yet, like a new JupyterHub admin taking over an existing deployment.hub.config
). While its possible to put this part in hub.extraConfig
where Python code strings can be put, it forces config related to auth to be separated which makes the it harder to understand and maintain.username_claim
as a callableusername_regex
as a string for OAuthenticator (and re-implements it in CILogon under allowed_idps.username_derivation
), where if its set it will extract the username relevant part specifically.Like the change proposal above, but where username_regex
is named username_picker
or similar, that can be either a regex string or callable being passed the user info object.
With this approach, we could deprecate Generic's username_claim
as callable with a traitlet validator that sets username_picker
instead whenever username_claim
as long as we stick with the function signature of passing just user_info
.
@yuvipanda and other reviewers, could you try to rank how the strategies below based on what you think will be best?
username_claim
as string/callable not only by Genericusername_regex
as new string configusername_picker
as new string/callable configI'm in favour of minimising the number of configuration properties. How confident are you that username_regex
works for all cases? Is username_claim
always a string?
Thanks for thinking about this, @consideRatio.
I generally think we should not be introducing regexes wherever possible (https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/). So while I understand the positives of it allowing YAML based config, I think overall introducing regexes to something as sensitive as username validation is not something we should do. Regexes are very very easy to get wrong, and very hard to debug, and escaping mismatches can cause security issues here. So I don't think we should be using regexes here.
I think the pattern of Union(<item>, Callable)
where the Callable will return the same thing that could have been statically presented is very prevelant in JupyterHub ecosystem, and also extremely powerful in a clean way. The ability to have arbitrary python code is one of the core strengths of traitlets, and I love us leaning into it. This python code in the test for handling CILogon is actually pretty clean and readable, compared to what one would have to do with a regex. So the Union(<item>, Callable)
is a pattern we should keep and promote.
I do understand your concern a bout the property itself being called username_claim
, and not actually returning a claim key to be used but the username itself. The name seems fine to me, but I understand it's subjective.
So from https://github.com/jupyterhub/oauthenticator/pull/717#issuecomment-1898254573 (which is very helpful btw), I'd say we should just not do (1). My preference is for (0), but instead of (2) let me propose a different alternative.
We have a callable called username_from_user_info
that is just a callable. This exists in the base OAuthenticator (and can be ported to CILogon as well). When the admin sets this, username_claim
will be ignored (but not forbidden, as the callable may use this setting). This would be a breaking change for GenericOAuthenticator, (username_claim will have to be string only) so it can be consistent. Let's call this option 4.
My preference would be still to just move the functionality as is from GenericOAuthenticator to OAuthenticator (what this PR does), for the following reasons:
So my ordering would be your option (0) and then my proposed option (4).
Thanks for reasoning with me about options, i think option 4 leaves open questions related to having also a function named user_info_to_username.
Okay so going with 0 feels more okay to me now that this has been deliberated on a bit more.
Review feedback for this PR assuming continued path on option 0:
Thanks @consideRatio. I've cleaned up the extra redefention in Generic
here. I started working on CILogon, but realized we'll have to figure out how to make the jsonschema based validation we do there work, since it can't handle callables by default. I'm also not sure how to get the unit tests to pass properly. So I've split that out into as a separate PR https://github.com/jupyterhub/oauthenticator/pull/718. Do you think this PR can proceed, and we can deal with CILogon separately?
Anything else I can do to get this merged? :)
Thank you @yuvipanda for taking time to reason about things!
I've opened #728 and updated the title to reflect that CILogon isn't providing a username_claim
config that is callable still.
While trying to use Auth0 for authentication in one of our hubs, we discovered that the most useful username_claim (
sub
) produces usernames that look likeoauth2|cilogon|http://cilogon.org/servera/users/43431
(when using auth0 with CILogon). The last part ofsub
is generally whatever is passed on to auth0, so it's going to be different for different users.I had thought
username_claim
was a callable, but turns out that's only true for GenericOAuthenticator. I think it's pretty useful for every authenticator, so I've just moved that functionality out to the base class instead. I also added a test to verify it works. The test is in GenericOAuthenticator because it was the easiest place to put it, but it works across authenticators. This also means it is fully backwards compatible.