Add `allowed_scopes` to all authenticators to allow some users based on granted scopes

yuvipanda commented 10 months ago

We can already control what scopes we ask for by setting 'scope'. However, OAuth2 states that not all the scopes requested may be granted, for whatever reason. This could be because the user choose not to give us the grant, or because the user themselves doesn't have access to grant us this scope (This is how Auth0 uses it for example - https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow/call-your-api-using-the-authorization-code-flow).

This PR adds an allowed_scopes property, which allows granting access to a subset of users based on the presence of a particular scope in the list of scopes granted. This can be used in addition to other authorization mechanisms to allow access to the hub.

yuvipanda commented 10 months ago

I've a local version of this deployed in https://github.com/2i2c-org/infrastructure/pull/3618, and it works well for use in auth0.

yuvipanda commented 10 months ago

Thanks for the review, @consideRatio! I've done the three things :)

yuvipanda commented 10 months ago

@consideRatio I spent some more time thinking through this, and agree that as is worded it doesn't actually do what it says in the documentation. So I've made the following changes:

If this check fails, we raise a 403 (similar to what happens in other places in the CILogonAuthenticator) with a useful error message (incorporating the suggestions from @minrk)
I've moved the check up, but still after allow_all - as I think allow_all should basically sidestep everything. However, since this can be a bad footgun, I've also added validation that makes sure that allow_all and required_scopes are mutually exclusive - you can't have both on. I think this makes sense, as by definition if you require some scopes for authorization, you aren't actually 'allowing all'.
I added the fact that you can raise a 403 to the method documentation
I also added validation to make sure that you aren't requiring scopes you aren't requesting (thanks @minrk).

How does it look to you now?

yuvipanda commented 10 months ago

And if we decide throwing a 403 here is ok, we should document it in JupyterHub too - I've done so at https://github.com/jupyterhub/jupyterhub/pull/4682

consideRatio commented 10 months ago

To give allow_all special power in allowing, making it more allowing than allowed_users for example, is not intuitive at all in my mind. It would also mess with the use case of allowing all users as long as they show up with the required issued scope - isn't this a key use case?

required_scopes and allow_all makes sense together imo - a user must be allowed and have the required scopes to be authorized - either the user is allowed via allow_all or allowed_users etc.

manics commented 10 months ago

The original idea behind allow_all is to allow any user without having to define allowed users/groups. For example, admins were inadvertently allowing any GitHub user to login to their installations, which is why we made a breaking change to add this, with a default of False https://github.com/jupyterhub/oauthenticator/pull/625

If required_scopes is mutually exclusive with allow_all then you must also also set allowed users/groups, and also means you can't allow all. Therefore I think the most intuitive behaviour is to say allow_all is the last check after all other prerequisites. Probably worth updating the docstring for allow_all to reduce confusion in future!

yuvipanda commented 10 months ago

I think there's a bunch of confusion around what allow_all really is doing here that's driving this, and I'm going to try to work through that.

The current doc string for 'allow_all' says:

Allow all authenticated users to login.

So in this conversation (https://github.com/jupyterhub/oauthenticator/pull/719#discussion_r1462531556) we established that required_scopes is part of the authorization check, rather than authentication. So to me that means that if authentication succeeds, allow_all should kick in and allow the user regardless of anything else. My understanding of the doc string is that we are basically ignoring all authorization steps.

To give allow_all special power in allowing, making it more allowing than allowed_users for example, is not intuitive at all in my mind.

@consideRatio my reading of the code is that this is the current behavior - if you set allow_all to be true, allowed_users is completely ignored. There's a return True here which means the allowed_users check is never reached. And I don't see that check being performed anywhere else either. I do agree this is unintuitive, and I'd suggest we clarify this by raising an error if both allow_all and allowed_users is both set. Am happy to open a PR for doing that.

Basically if allow_all is set to True, and is meant to 'Allow all authenticated users', it means we completely turn off all other authorization logic. This would work in cases where we defer authorization to the external auth provider as well - for example, if we configure CILogon to accept only a particular university's credentials.

So my understanding of allow_all is that it's really allow_all_authenticated, and we completely skip authorization offered as part of base OAuthenticator.

However, I do see that in subclasses, we do offer additional checks. So if the following config is present with say GitHubOAuthenticator:

allow_all: True
allowed_users: ["user1"]
allowed_organizations: ["org1"]

What happens is that allowed_users is silently ignored, but allowed_organizations is enforced! If instead the config is the following:


```yaml
allow_all: False
allowed_users: ["user1"]
allowed_organizations: ["org1"]

Then both allowed_users and allowed_organizations is respected.

I think this is very confusing and unintuitive.

So allow_all is not allow_all_authenticated, but as implemented (before this PR), it's really ignore_allowed_users_and_config_admin_users. With this PR (as is), it becomes ignore_allowed_users_and_required_scopes_and_config_admin_users, which I agree actually makes this worse not better :) So some kind of change is necessary, although I'm not sure what exactly.

Is my understanding of the situation and reasoning correct? What am I missing?

consideRatio commented 10 months ago

My understanding is that we currently have ways to allow users, and they can overlap, where allow_all is guaranteed to overlap with any other way.

Since oauthenticator 16, there is no way to be be authorized without being explicitly allowed by one config.

Also in oauthenticator 16, no config able to allow would disallow users. In oauth 15 on the other hand, user not in a github org was disallowed, even if they were allowed in allowed_users or admin_users for example.

So now we have pure allow rules and pure disallow rules or requirements on the users besides being allowed, and the logic is trivial compared to when we had combinations of behavior on allowing/disallowing.

I'm uncertain about your confusion points at the moment, but i think it relates to old oauthenticator 15 behaviors.

yuvipanda commented 10 months ago

Since the allow_all situation is unrelated to this PR, I've now done the following:

Move required_checks above allow_all, and have it not return anything. So this acts purely as a way to deny some people access, rather than allow anyone access.
Remove the validation that allow_all and required_scopes is mutual. I've kept the validation that required_scopes should be a subset of requested scopes.

yuvipanda commented 10 months ago

@consideRatio

I'm uncertain about your confusion points at the moment, but i think it relates to old oauthenticator 15 behaviors.

I linked to specific code points in current main branch when discussing my reasons about allow_all and allowed_users, so I don't think this is true.

yuvipanda commented 10 months ago

Regardless, since the allow_all situation is unrelated to this PR, I'll open a separate issue about that. I've made changes here as requested.

consideRatio commented 10 months ago

allow_all: True
allowed_users: ["user1"]
allowed_organizations: ["org1"]
What happens is that allowed_users is silently ignored, but allowed_organizations is enforced!

What do you mean enforced? allowed_organizations is allowing members of those, but doesnt disallow those outside in oauthenticator 16, so you can use allowed_organizations + a few allowed_users + a few admin_users. Use of allow_all just makes it pointless to allow users with allowed_organizations or allowed_users.

But, at the same time. Maybe you want to briefly allow all users, so you toggle it on briefly, and then later toggles it off. So, then its good to allow to be next to other config in a way.

yuvipanda commented 10 months ago

@consideRatio I've opened https://github.com/jupyterhub/oauthenticator/issues/723 to carry on this discussion, separated out from the functionality this PR is adding.

yuvipanda commented 10 months ago

@consideRatio and short version seems to be that my understanding is missing some pieces, so I don't really think there's any action item for anyone else. I may make a PR with a better docstring once I'm done digging through this. I'll update #723 accordingly.

This PR currently implements your suggestions in https://github.com/jupyterhub/oauthenticator/pull/719#pullrequestreview-1838530737.

yuvipanda commented 10 months ago

I think I just missed that there's a short circuit here for any return True, not just for allow_all (https://github.com/jupyterhub/oauthenticator/blob/5667bf15f8e1dead29c32cc0886146d6ddb835ec/oauthenticator/github.py#L173). Sorry for the noise, @consideRatio and @manics.

yuvipanda commented 10 months ago

After muddling through a little, I think I've a clearer idea of my issue here.

If we say allow_all will allow all authenticated users,
And that required_scopes is an authorization step, not an authentication step
The PR as it is now means that required_scopes is the only authorization step that is not affected by allow_all.

I've made https://github.com/jupyterhub/oauthenticator/pull/719/commits/ea2972c909924f2ed1f99c200571d82fdaac5f83 this commit updating the docstring for allow_all - as allow_all will still respect requested_scopes even if it ignores every other authorization step.

Does this feel iffy to anyone else?

yuvipanda commented 10 months ago

Now that I have a better understanding of what exactly my confusion was, let me rework this PR just a little to make that be better.

yuvipanda commented 10 months ago

Ok so the core issue was that this was an authorization check that was kinda being implemented like an authentication check. Since 16, OAuthenticator has many different allowed_X properties that gate which users are allowed to access the hub - these are all authorization properties. required_scopes was a bit like that, but not entirely, and that led down my rabbit hole of confusion.

I've now changed the PR to make this much clearer. The traitlet is now allowed_scopes, and if all of them are present, it grants you access to the hub automatically. This now works very much like all the other allowed_X properties, and comes after allow_all as well. So we don't actually have to modify the docstring of allow_all to talk about this exception.

This is different from allowed_organizations in that all the scopes in this list must be present, while allowed_organizations is fine with any of the orgs present. I think this could also be changed, and I'll work through that.

But regardless, I think this fits in much nicer than required_scopes. I know I went down a blind alley for a while, but I'm very happy with where I've come out of my confusion with. Thanks for the review @consideRatio and @manics

minrk commented 10 months ago

I think the latest behavior as it is makes sense and is consistent with other allow methods, which can only grant access, and cannot be used to prevent access granted via other allow mechanisms.

If someone does want "block unless this scope is held" behavior, the method to implement is check_blocked_users, which is the negative equivalent to check_allowed (don't ask my why it's not called check_blocked for symmetry, I don't remember), and has priority over allow checks.

consideRatio commented 10 months ago

The traitlet is now allowed_scopes, and if all of them are present, it grants you access to the hub automatically. This now works very much like all the other allowed_X properties, and comes after allow_all as well. So we don't actually have to modify the docstring of allow_all to talk about this exception.

I see allow_all as one one of several properties allowing users. That other allow properties only allow subsets of the users allowed by allow_all isn't weird or a proplem in my mind.

I think it should be allowed to configure allow_all=True without erroring even if other allow properties are declared. I also reason that it doesn't matter if you return True from point A or B in the check_allowed function as implemented in oauthenticator as we never return False before checking all paths towards return True.

consideRatio commented 10 months ago

@yuvipanda I think the pivot to providing allowed_scopes instead of required_scopes is reasonable and aligns more with other config. At the same time, its fundamentally different, where there can become a need for required_scopes alongside this config, which is fine.

Naming

I'd like to propose the naming allowing_scopes, users issued all scopes in one set of scopes will be allowed access.

I propose another name than allowed_scopes as that sounds like its about scopes being allowed (which is hard to make sense of) rather than users issued those scopes are allowed. Overall allowed_scopes also sounds like it could be a requirement to meet, but its just one possible way to get allowed access as a user.

All scopes, or any of the scopes?

If we start with allowing users issued all scopes in a set of scopes via this config, we can then easily either now or later let the config accept a set of "set of scopes", where having any set of scopes fully issued would allow the user.

Example configurations

# allow users for whom both a and b are issued
# this represents the PR in its current form - a single set of scopes is provided
allowing_scope = {"a", "b"}

# allow users for whom either a are issued or b1 and b2 are issued
allowing_scopes = {
  {"a"},
  {"b1", "b2"},
}

minrk commented 10 months ago

I think allowed_scopes is preferable, since it shares a namespace with other allowed_ config. I think there is a benefit to all such names sharing a prefix and having consistent behavior. It's not clear to me without further explanation what allowing_scopes would mean, which is not a natural phrase, to me.

I think it should be allowed to configure allow_all=True without erroring even if other allow properties are declared.

A warning could provide some experience improvement to make it clearer that some config will have no effect, but I agree an error is too far.

manics commented 10 months ago

allowed_scopes is going to be inconsistent anyway, since all scopes are required, whereas with the other properties you only have to be a member of one.

If we wanted to be consistent we could go with the suggestion in https://github.com/jupyterhub/oauthenticator/pull/719#issuecomment-1913262663 of makeing allowed_scopes a list of lists?

consideRatio commented 10 months ago

Considering naming, here is an overview of the config that allows users:

admin_users
allowed_users
(oauth) allow_all
(oauth) allow_existing_users
(bitbucket) allowed_teams
(cilogon) allowed_domains
(generic, openshift) allowed_groups
(github) allowed_organizations
(gitlab) allowed_gitlab_groups
(globus) allowed_globus_groups
(google) allowed_google_groups
(google) admin_google_groups

We have the prefixes admin, allow, and allowed so far, so adding allowing would be another.

Looking at the terminology in https://datatracker.ietf.org/doc/html/rfc6749#section-3.3, I think the terminology is "requested scope" and "granted scope" (the access token is "issued" and the scope the token has is "granted").

Do you consider allow_granted_scope or allowed_granted_scope to be an improvement to allowed_scope @yuvipanda @minrk ? I think it would help convey the config function better from its name than allowed_scope.

manics commented 10 months ago

allowed_scopes or similar is going to be inconsistent anyway, since all scopes are required, whereas with the other properties only one is required.

consideRatio commented 10 months ago

allowed_scopes or similar is going to be inconsistent anyway, since all scopes are required, whereas with the other properties only one is required.

If we support the sets of sets / lists of lists version, its like allowed_scope_sets, where each set of scopes are granted, making it consistent with being individual parts allowing a user access, but each part is a "scope set" instead of individual requested scope entries.

Function when providing a single set could be made to mean "allow when any of scope entries are granted" (OR) or "allow when all of scope entries are granted" (AND), do you think it should be the OR version if just passing the config a single set/list of scopes @manics?

manics commented 10 months ago

I don't know..... given this is a new property we could go for the list-of-lists only? It's more complicated for admins but at least it's unambiguous?

yuvipanda commented 9 months ago

Thanks for the comments everyone, I'm very swamped this week but will come back to this next week.

yuvipanda commented 8 months ago

Sigh, momentum seems so important to me getting stuff done. There are merge conflicts here already :(

yuvipanda commented 8 months ago

ok, so my understanding is that there are two questions at play:

Naming
Should this be a list of lists from the start?

For naming, I would like to propose we keep this as is, as I agree with @minrk's comments in https://github.com/jupyterhub/oauthenticator/pull/719#issuecomment-1914178078. I also think allowing would make sense if any of the scopes granted access, but in this case, you need all to be granted access.

For list of lists, the question for me is if we can add that in the future in a way that doesn't break backwards compatibility. Can we meaningfully differentiate between a 'list of strings' vs 'list of lists' scenario? The way I approached this is by trying to figure out what documentation for such a property in the future would look like. If we add this feature in the future, here's what the docstring may be like

Allow users to login based on the scopes they have been granted.

If this is a list of strings, then users must be granted *all* these scopes
to be able to log in.

If this is a list of list of strings, then the user must be granted all the
scopes in at least one of the lists to be allowed to log in.

This reads natural enough to me. You can either have a list of list of strings, or for the most common case, a list of strings. It's unambiguous. We don't even know if admins would want to have a list of list of strings. We do know that they would want to have a list of strings. Given that we can evolve this later on in a backwards compatible way, I think it's safe to try this one as is.

So to summarize:

I think allowed_scopes is a good enough name, and consistent with the rest of our config.
We can add list of list of strings in the future if that sees an actual use case.

I would love to see this be merged! I think it's ready to go as is, now that I've fixed the merge conflicts.

yuvipanda commented 7 months ago

anyone wanna hit merge? :)

yuvipanda commented 7 months ago

yay, thank you @minrk

jupyterhub / oauthenticator

Add `allowed_scopes` to all authenticators to allow some users based on granted scopes #719

Naming

All scopes, or any of the scopes?