Closed yuvipanda closed 7 months ago
I've a local version of this deployed in https://github.com/2i2c-org/infrastructure/pull/3618, and it works well for use in auth0.
Thanks for the review, @consideRatio! I've done the three things :)
@consideRatio I spent some more time thinking through this, and agree that as is worded it doesn't actually do what it says in the documentation. So I've made the following changes:
allow_all
and required_scopes
are mutually exclusive - you can't have both on. I think this makes sense, as by definition if you require some scopes for authorization, you aren't actually 'allowing all'.How does it look to you now?
And if we decide throwing a 403 here is ok, we should document it in JupyterHub too - I've done so at https://github.com/jupyterhub/jupyterhub/pull/4682
To give allow_all special power in allowing, making it more allowing than allowed_users for example, is not intuitive at all in my mind. It would also mess with the use case of allowing all users as long as they show up with the required issued scope - isn't this a key use case?
required_scopes and allow_all makes sense together imo - a user must be allowed and have the required scopes to be authorized - either the user is allowed via allow_all or allowed_users etc.
The original idea behind allow_all
is to allow any user without having to define allowed users/groups. For example, admins were inadvertently allowing any GitHub user to login to their installations, which is why we made a breaking change to add this, with a default of False
https://github.com/jupyterhub/oauthenticator/pull/625
If required_scopes
is mutually exclusive with allow_all
then you must also also set allowed users/groups, and also means you can't allow all. Therefore I think the most intuitive behaviour is to say allow_all
is the last check after all other prerequisites. Probably worth updating the docstring for allow_all
to reduce confusion in future!
I think there's a bunch of confusion around what allow_all
really is doing here that's driving this, and I'm going to try to work through that.
The current doc string for 'allow_all' says:
Allow all authenticated users to login.
So in this conversation (https://github.com/jupyterhub/oauthenticator/pull/719#discussion_r1462531556) we established that required_scopes is part of the authorization check, rather than authentication. So to me that means that if authentication succeeds, allow_all should kick in and allow the user regardless of anything else. My understanding of the doc string is that we are basically ignoring all authorization steps.
To give allow_all special power in allowing, making it more allowing than allowed_users for example, is not intuitive at all in my mind.
@consideRatio my reading of the code is that this is the current behavior - if you set allow_all
to be true, allowed_users
is completely ignored. There's a return True
here which means the allowed_users check is never reached. And I don't see that check being performed anywhere else either. I do agree this is unintuitive, and I'd suggest we clarify this by raising an error if both allow_all and allowed_users is both set. Am happy to open a PR for doing that.
Basically if allow_all
is set to True, and is meant to 'Allow all authenticated users', it means we completely turn off all other authorization logic. This would work in cases where we defer authorization to the external auth provider as well - for example, if we configure CILogon to accept only a particular university's credentials.
So my understanding of allow_all
is that it's really allow_all_authenticated
, and we completely skip authorization offered as part of base OAuthenticator.
However, I do see that in subclasses, we do offer additional checks. So if the following config is present with say GitHubOAuthenticator:
allow_all: True
allowed_users: ["user1"]
allowed_organizations: ["org1"]
What happens is that allowed_users
is silently ignored, but allowed_organizations
is enforced! If instead the config is the following:
```yaml
allow_all: False
allowed_users: ["user1"]
allowed_organizations: ["org1"]
Then both allowed_users
and allowed_organizations
is respected.
I think this is very confusing and unintuitive.
So allow_all
is not allow_all_authenticated
, but as implemented (before this PR), it's really ignore_allowed_users_and_config_admin_users
. With this PR (as is), it becomes ignore_allowed_users_and_required_scopes_and_config_admin_users
, which I agree actually makes this worse not better :) So some kind of change is necessary, although I'm not sure what exactly.
Is my understanding of the situation and reasoning correct? What am I missing?
My understanding is that we currently have ways to allow users, and they can overlap, where allow_all is guaranteed to overlap with any other way.
Since oauthenticator 16, there is no way to be be authorized without being explicitly allowed by one config.
Also in oauthenticator 16, no config able to allow would disallow users. In oauth 15 on the other hand, user not in a github org was disallowed, even if they were allowed in allowed_users or admin_users for example.
So now we have pure allow rules and pure disallow rules or requirements on the users besides being allowed, and the logic is trivial compared to when we had combinations of behavior on allowing/disallowing.
I'm uncertain about your confusion points at the moment, but i think it relates to old oauthenticator 15 behaviors.
Since the allow_all
situation is unrelated to this PR, I've now done the following:
required_checks
above allow_all
, and have it not return anything. So this acts purely as a way to deny some people access, rather than allow anyone access.@consideRatio
I'm uncertain about your confusion points at the moment, but i think it relates to old oauthenticator 15 behaviors.
I linked to specific code points in current main branch when discussing my reasons about allow_all
and allowed_users
, so I don't think this is true.
Regardless, since the allow_all
situation is unrelated to this PR, I'll open a separate issue about that. I've made changes here as requested.
allow_all: True allowed_users: ["user1"] allowed_organizations: ["org1"]
What happens is that allowed_users is silently ignored, but allowed_organizations is enforced!
What do you mean enforced? allowed_organizations is allowing members of those, but doesnt disallow those outside in oauthenticator 16, so you can use allowed_organizations + a few allowed_users + a few admin_users. Use of allow_all just makes it pointless to allow users with allowed_organizations or allowed_users.
But, at the same time. Maybe you want to briefly allow all users, so you toggle it on briefly, and then later toggles it off. So, then its good to allow to be next to other config in a way.
@consideRatio I've opened https://github.com/jupyterhub/oauthenticator/issues/723 to carry on this discussion, separated out from the functionality this PR is adding.
@consideRatio and short version seems to be that my understanding is missing some pieces, so I don't really think there's any action item for anyone else. I may make a PR with a better docstring once I'm done digging through this. I'll update #723 accordingly.
This PR currently implements your suggestions in https://github.com/jupyterhub/oauthenticator/pull/719#pullrequestreview-1838530737.
I think I just missed that there's a short circuit here for any return True
, not just for allow_all
(https://github.com/jupyterhub/oauthenticator/blob/5667bf15f8e1dead29c32cc0886146d6ddb835ec/oauthenticator/github.py#L173). Sorry for the noise, @consideRatio and @manics.
After muddling through a little, I think I've a clearer idea of my issue here.
allow_all
will allow all authenticated users,required_scopes
is an authorization step, not an authentication steprequired_scopes
is the only authorization step that is not affected by allow_all.I've made https://github.com/jupyterhub/oauthenticator/pull/719/commits/ea2972c909924f2ed1f99c200571d82fdaac5f83 this commit updating the docstring for allow_all - as allow_all
will still respect requested_scopes
even if it ignores every other authorization step.
Does this feel iffy to anyone else?
Now that I have a better understanding of what exactly my confusion was, let me rework this PR just a little to make that be better.
Ok so the core issue was that this was an authorization check that was kinda being implemented like an authentication check. Since 16, OAuthenticator has many different allowed_X
properties that gate which users are allowed to access the hub - these are all authorization properties. required_scopes
was a bit like that, but not entirely, and that led down my rabbit hole of confusion.
I've now changed the PR to make this much clearer. The traitlet is now allowed_scopes
, and if all of them are present, it grants you access to the hub automatically. This now works very much like all the other allowed_X
properties, and comes after allow_all
as well. So we don't actually have to modify the docstring of allow_all
to talk about this exception.
This is different from allowed_organizations
in that all the scopes in this list must be present, while allowed_organizations
is fine with any of the orgs present. I think this could also be changed, and I'll work through that.
But regardless, I think this fits in much nicer than required_scopes
. I know I went down a blind alley for a while, but I'm very happy with where I've come out of my confusion with. Thanks for the review @consideRatio and @manics
I think the latest behavior as it is makes sense and is consistent with other allow
methods, which can only grant access, and cannot be used to prevent access granted via other allow
mechanisms.
If someone does want "block unless this scope is held" behavior, the method to implement is check_blocked_users
, which is the negative equivalent to check_allowed
(don't ask my why it's not called check_blocked
for symmetry, I don't remember), and has priority over allow checks.
The traitlet is now
allowed_scopes
, and if all of them are present, it grants you access to the hub automatically. This now works very much like all the otherallowed_X
properties, and comes afterallow_all
as well. So we don't actually have to modify the docstring ofallow_all
to talk about this exception.
I see allow_all
as one one of several properties allowing users. That other allow properties only allow subsets of the users allowed by allow_all
isn't weird or a proplem in my mind.
I think it should be allowed to configure allow_all=True
without erroring even if other allow properties are declared. I also reason that it doesn't matter if you return True
from point A or B in the check_allowed
function as implemented in oauthenticator as we never return False before checking all paths towards return True
.
@yuvipanda I think the pivot to providing allowed_scopes
instead of required_scopes
is reasonable and aligns more with other config. At the same time, its fundamentally different, where there can become a need for required_scopes
alongside this config, which is fine.
I'd like to propose the naming allowing_scopes
, users issued all scopes in one set of scopes will be allowed access.
I propose another name than allowed_scopes
as that sounds like its about scopes being allowed (which is hard to make sense of) rather than users issued those scopes are allowed. Overall allowed_scopes
also sounds like it could be a requirement to meet, but its just one possible way to get allowed access as a user.
If we start with allowing users issued all scopes in a set of scopes via this config, we can then easily either now or later let the config accept a set of "set of scopes", where having any set of scopes fully issued would allow the user.
Example configurations
# allow users for whom both a and b are issued
# this represents the PR in its current form - a single set of scopes is provided
allowing_scope = {"a", "b"}
# allow users for whom either a are issued or b1 and b2 are issued
allowing_scopes = {
{"a"},
{"b1", "b2"},
}
I think allowed_scopes
is preferable, since it shares a namespace with other allowed_
config. I think there is a benefit to all such names sharing a prefix and having consistent behavior. It's not clear to me without further explanation what allowing_scopes
would mean, which is not a natural phrase, to me.
I think it should be allowed to configure allow_all=True without erroring even if other allow properties are declared.
A warning could provide some experience improvement to make it clearer that some config will have no effect, but I agree an error is too far.
allowed_scopes
is going to be inconsistent anyway, since all scopes are required, whereas with the other properties you only have to be a member of one.
If we wanted to be consistent we could go with the suggestion in https://github.com/jupyterhub/oauthenticator/pull/719#issuecomment-1913262663 of makeing allowed_scopes
a list of lists?
Considering naming, here is an overview of the config that allows users:
admin_users
allowed_users
allow_all
allow_existing_users
allowed_teams
allowed_domains
allowed_groups
allowed_organizations
allowed_gitlab_groups
allowed_globus_groups
allowed_google_groups
admin_google_groups
We have the prefixes admin
, allow
, and allowed
so far, so adding allowing
would be another.
Looking at the terminology in https://datatracker.ietf.org/doc/html/rfc6749#section-3.3, I think the terminology is "requested scope" and "granted scope" (the access token is "issued" and the scope the token has is "granted").
Do you consider allow_granted_scope
or allowed_granted_scope
to be an improvement to allowed_scope
@yuvipanda @minrk ? I think it would help convey the config function better from its name than allowed_scope
.
allowed_scopes
or similar is going to be inconsistent anyway, since all scopes are required, whereas with the other properties only one is required.
allowed_scopes
or similar is going to be inconsistent anyway, since all scopes are required, whereas with the other properties only one is required.
If we support the sets of sets / lists of lists version, its like allowed_scope_sets, where each set of scopes are granted, making it consistent with being individual parts allowing a user access, but each part is a "scope set" instead of individual requested scope entries.
Function when providing a single set could be made to mean "allow when any of scope entries are granted" (OR) or "allow when all of scope entries are granted" (AND), do you think it should be the OR version if just passing the config a single set/list of scopes @manics?
I don't know..... given this is a new property we could go for the list-of-lists only? It's more complicated for admins but at least it's unambiguous?
Thanks for the comments everyone, I'm very swamped this week but will come back to this next week.
Sigh, momentum seems so important to me getting stuff done. There are merge conflicts here already :(
ok, so my understanding is that there are two questions at play:
For naming, I would like to propose we keep this as is, as I agree with @minrk's comments in https://github.com/jupyterhub/oauthenticator/pull/719#issuecomment-1914178078. I also think allowing
would make sense if any of the scopes granted access, but in this case, you need all to be granted access.
For list of lists, the question for me is if we can add that in the future in a way that doesn't break backwards compatibility. Can we meaningfully differentiate between a 'list of strings' vs 'list of lists' scenario? The way I approached this is by trying to figure out what documentation for such a property in the future would look like. If we add this feature in the future, here's what the docstring may be like
Allow users to login based on the scopes they have been granted.
If this is a list of strings, then users must be granted *all* these scopes
to be able to log in.
If this is a list of list of strings, then the user must be granted all the
scopes in at least one of the lists to be allowed to log in.
This reads natural enough to me. You can either have a list of list of strings, or for the most common case, a list of strings. It's unambiguous. We don't even know if admins would want to have a list of list of strings. We do know that they would want to have a list of strings. Given that we can evolve this later on in a backwards compatible way, I think it's safe to try this one as is.
So to summarize:
allowed_scopes
is a good enough name, and consistent with the rest of our config.I would love to see this be merged! I think it's ready to go as is, now that I've fixed the merge conflicts.
anyone wanna hit merge? :)
yay, thank you @minrk
We can already control what scopes we ask for by setting 'scope'. However, OAuth2 states that not all the scopes requested may be granted, for whatever reason. This could be because the user choose not to give us the grant, or because the user themselves doesn't have access to grant us this scope (This is how Auth0 uses it for example - https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow/call-your-api-using-the-authorization-code-flow).
This PR adds an
allowed_scopes
property, which allows granting access to a subset of users based on the presence of a particular scope in the list of scopes granted. This can be used in addition to other authorization mechanisms to allow access to the hub.