jupyterhub / oauthenticator

OAuth + JupyterHub Authenticator = OAuthenticator
https://oauthenticator.readthedocs.io
BSD 3-Clause "New" or "Revised" License
413 stars 365 forks source link

[All] Move group management from generic to base oauthenticator (`allowed_groups`, `admin_groups`, `auth_state_groups_key`) #735

Closed yuvipanda closed 4 months ago

yuvipanda commented 6 months ago

Motivating use cases

External identity providers providing JupyterHub memberships is an extremely useful feature that should be present not just for GenericOAuthenticator but for all authenticators. But to do so in helpful ways, this PR considers two motivating use cases:

  1. A hub using Auth0OAuthenticator. In Auth0, scopes granted are what is used to provide the notion of 'can the user perform this task?', which can be used as group membership. This is what auth0 recommends, there is currently no other way to do 'groups' in Auth0. scope is inside auth_model, but not auth_state, since scope is granted each time the user is logged in.
  2. In GitHubOAuthenticator, we put the list of teams the user is in inside auth_state. This is the perfect piece of information to use for group membership.
  3. oauth_user gets put inside auth_state, and in general auth_state is a good place for this kinda group information to be in. Authenticators can put arbitrary stuff inside auth_state and use them as they wish.

Approaches considered and rejected

  1. Each authenticator figures out how groups work within it. This was tried out in https://github.com/jupyterhub/oauthenticator/pull/739, and rejected by @minrk in https://github.com/jupyterhub/oauthenticator/pull/735#issuecomment-2041996689. I agree with this rejection.
  2. Allow group information to be picked up from the auth_model with a auth_model_groups_key. This would be same as the current claim_groups_key, but pick from the entire auth_model instead of just from the returned user object. This was the tack this PR was taking, primarily because I thought we needed it to handle use case 1 mentioned earlier. But turns out I was wrong - I had thought that scope was part of auth_model but not auth_state, but we do! And regardless, I also realized we don't expose auth_model anywhere, but we do expose auth_state. And I had a TODO for 'document what is in auth_model', and while trying to do that, decided we shouldn't expose that to configurable tweaks like this for now. So that was reverted in b337015306f51a8a9ea8e95924a7562bfa1e56ba and a different approach was taken.

Approach this PR takes

The general approach to group management is:

  1. Authenticator puts something that may be relevant to group membership inside auth_state.
  2. There's an auth_state_groups_key that can be either a callable or dotted specification that generates a list of groups from something in auth_state.

This handles case (2) because list of teams is already in auth_state. And can handle (1) by us putting scope in some form inside auth_state. This also provides a clear extensible mechanism in the future for all group management - get it into auth_state (where it can be used for anything), and pick that out with auth_state_groups_key.

Backwards compatibility

Breaking change

TODO

Fixes https://github.com/jupyterhub/oauthenticator/issues/709

yuvipanda commented 6 months ago

All this functionality would be extremely useful to all the authenticators, so should be in base.

I think an important question now becomes - what is the point of GenericOAuthenticator? Is that just here for backwards compatibility?

I think the answer should be 'yes'. And in the future, anything that relies on OAuth2 functionality should just go to the base OAuth2 provider, and anything specific to any of the providers can go in their own files. And we suggest people who currently use Generic migrate to just using the base OAuth2 provider.

minrk commented 6 months ago

what is the point of GenericOAuthenticator?

I believe GenericOAuthenticator should eventually go away and be merged into the base class, as long as we can do it smoothly and it doesn't complicate things unnecessarily (I don't think it will). I think the big refactor in #526 was a big step toward making that feasible.

Not having done that yet, going forward in general I think we should avoid new features in Generic in favor of putting them in the base class, at least in most cases.

yuvipanda commented 6 months ago

I was thinking about how we can do a very common thing - sync GitHub team memberships and org memberships to JupyterHub groups. With this PR as is, that's actually not possible - team memberships are not part of userdata! But they are part of the auth_model (because there's a populate_teams_in_auth_model property already).

I think there are two paths forward here for groups to work everywhere:

  1. Lift that into the base Authenticator class, but the key is for things from auth_model, not userdata. This also allows for using things like scope (which can control group membership with Auth0, from another community we are working with). We can name this appropriately, and the keep a deprecated claims_group_key in Generic that notifies it's deprecated but works the way it already does.
  2. https://github.com/jupyterhub/oauthenticator/pull/739 or similar for each authenticator we want to support good groups.

My preference is to do (1)! There's a clean backwards compat story, and the model of 'put stuff into auth_state, that is available via auth_model, and you can pull stuff out of that for groups' seems clean enough to explain.

minrk commented 6 months ago

Option 1 sounds like a great choice, and also a great opportunity to not inherit the name while preserving compatibility.

yuvipanda commented 5 months ago

This went through a few iterations, but is ready for review again! I've updated the PR body with more detailed information as well. Please take another look when you got time, @minrk :)

yuvipanda commented 4 months ago

After trying to write clear docs for the traitlet, I came to the conclusion that we shouldn't allow usage of these without manage_groups also being True. I've updated a paragraph under Backwards compatibiltiy with my rationale.

With that, this is ready to go!

yuvipanda commented 4 months ago

I found https://github.com/jupyterhub/oauthenticator/issues/709, which I think is closed by this PR. And my change with respect to manage_groups reflects:

do we need to deal with additional groups, not specified upstream somehow? I'd say no, at least for now.

Which I totally agree with. Groups should have a 'single source of truth'.

yuvipanda commented 4 months ago

Can I get someone to hit merge? Thanks!

consideRatio commented 4 months ago

@yuvipanda thank you for working this!!!

I'm not able to invest time to review this fully atm :/ I saw some minor fixes I'd like to see updated:

yuvipanda commented 4 months ago

Thanks @consideRatio.

I'm not able to invest time to review this fully atm :/

I don't think you need to, as long as you don't block it. I think it's seen enough eyes, and it's been sitting here for close to 3 months. In general I'd like us to move faster and be less perfectionist. For example, I see tests are failing now - probably unrelated, as they're in globus. This is super demotivating, and I don't want to lose myself as a contributor. I'm also going to proactively merge other PRs and change my attitude a little bit more, being the change I want to see.

With that said, I appreciate the other 3 points you have raised and the way you have communicated them. I've moved the references to 16.4 to 17.0, and added a section called Breaking Changes to call this out explicitly (rather than inside the Backwards compatibilty section as before. I prefer the PR title as is, but you're welcome to change it if you wish.

yuvipanda commented 4 months ago

These tests are failing on main as well, so I opened https://github.com/jupyterhub/oauthenticator/issues/743.

Back to now thinking this PR is ready to merge.

yuvipanda commented 4 months ago

yay, thank you @manics

benjimin commented 3 months ago

@yuvipanda there are numerous hooks to facilitate local customisation. What API should local code in these hooks use to access the list of groups that the user is member to?

For example, depending on group membership, I want to filter my profile list and adjust singleuser server settings. If my code runs in auth_state_hook, options_form, pre_spawn_hook or modify_pod_hook, then it will be passed the spawner instance, or if runs in post_auth_hook it will be passed the authenticator instance and auth_model (not auth_state). I take it that accessing auth_model["groups"] would be discouraged? Accessing spawner.user.groupsyields the same as user.orm_user.groups, a list of SQAlchemy objects rather than a trivial set of strings to test membership in. Should customisations try to work with that, or try to call something like spawner.authenticator.get_user_groups(spawner.user.get_auth_state())?