Open jtpio opened 4 years ago
I am for pre_build_hook
:
pre_build_hook
should be called just before these linesuser_model = self.hub_auth.get_user(self)
Yes that would be the idea :+1:
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/binderhub-with-private-gitlab-and-user-scopes/3502/5
One thing we have to be careful about/make clear to the admin is the difference between the auth token obtained for the user and the one that currently exists which is for the whole BinderHub.
The other thing is passing around/making accessible the user's token at all the right places.
This would be a nice new feature!
Maybe the handler could be passed to the pre_build_hook
directly?
Something like the following:
pre_build_hook = self.settings['pre_build_hook']
if pre_build_hook:
await maybe_future(pre_build_hook(self))
Then it's up to the user to decide what to do with the build handler.
Similar to the way the handler is made available to the spawner in JupyterHub: https://github.com/jupyterhub/jupyterhub/blob/76c9111d80660e93578f80dbe441cfb702c1b207/jupyterhub/user.py#L542-L544
Maybe the handler could be passed to the pre_build_hook directly?
yes, thats also what I thought. I think the same is also done in pre_launch_hook
, launcher itself is the first parameter.
Btw after reading @betatim s comment, it is not clear to me: for your case this won't require any additional token for each user, right?
This wouldn't require additional token. In the hook we could for example retrieve the user name with the snippet you posted above:
in hook you could reach user data easily (probably) with user_model = self.hub_auth.get_user(self)
Although this would not give the user auth_state
I think? But the provided git_credentials
token could still be used to make HTTP requests and check the user access using the username.
Although this would not give the user auth_state I think?
I am not sure but yes, I think user_model
dict doesnt contain auth_state
. But by using the username you can make a request to JupyterHub API (users/<username>
) and get user data, which should contain the auth_state
.
There's an open issue to make auth_state
available: https://github.com/jupyterhub/jupyterhub/issues/1704
@bitnik Are you saying it's already possible?
it must be available for admin users: https://github.com/jupyterhub/jupyterhub/blob/76c9111d80660e93578f80dbe441cfb702c1b207/jupyterhub/apihandlers/users.py#L126-L138
and because binder service has admin access to hub API, this should work for @jtpio 's case.
Thanks @manics and @bitnik for the context and pointers!
If the binder user is an admin, they there could indeed be a request to the hub API to retrieve the user's auth_state
in the pre_build_hook
.
Just tested and we can indeed retrieve the user auth_state
:+1:
For example in the pre_launch_hook
with:
async def pre_launch_hook(launcher, image, username, server_name, repo_url):
user = await launcher.get_user_data(username)
auth_state = user.get('auth_state', None)
With a pre_build_hook
, we could probably achieve a similar thing with:
async def pre_build_hook(handler):
user_model = handler.hub_auth.get_user(handler)
username = user_model['name']
# ideally reusing the api_request or get_user_data methods from the launcher
resp = await api_request(f'users/{username}', method='GET')
user = json.loads(resp.body.decode('utf-8'))
auth_state = user.get('auth_state', None)
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/binderhub-with-private-gitlab-and-user-scopes/3502/6
For a use case where we would want to run an authenticated BinderHub instance whose rights for cloning private repositories would match those of an underlying Gitlab instance (and where the Gitlab service would also provide authentication), if I understand correctly, a pre_build_hook
would still require a unique token to clone all private repositories within the gitlab instance?
Instead, in an authenticated BinderHub, it might be desirable to assume the identity of the authenticated user for cloning private repositories -- if only for the user experience (this would remove the need to add a technical "binderhub" user to the gitlab instance and to make it a member of each project to be built).
Would there be a solution that would remove the need for a single user/token that has (at least read) access to the whole set of private repositories within a gitlab instance, while being minimally disturbing to the existing Binderhub model ?
Proposed change
This issue is related to the idea mentioned in this Discourse topic: https://discourse.jupyter.org/t/binderhub-with-private-gitlab-and-user-scopes/3502
Looking at the code, it seems like there is (at the moment) no hook or option that could be set to tweak the behavior of the
/build
endpoint, or more generally of thebuidler
.The idea is to be able to implement fine-grained access control to BinderHub based on the JupyterHub authenticator used to authenticate users.
The use case is summarized as follows:
Could not resolve ref for my-project/repo. Double check your URL.
would ideally be shownGitLab
in the dropdown menu (after configuringrepo_providers
). This looks like it should be solved by https://github.com/jupyterhub/binderhub/pull/1038 :tada:Alternative options
An alternative option might be to add an extra build handler to the main app, and change the frontend to use that endpoint instead.
However this adds a lot of complexity to the BinderHub admin as it would require maintaining custom Docker images and helm charts with these changes.
Who would use this feature?
Those who want to have a custom BinderHub setup implementing user access based on the user access pattern from the JupyterHub authenticator (GitLab, GitHub).
(Optional): Suggest a solution
Provided that an access token was generated according to: https://binderhub.readthedocs.io/en/latest/zero-to-binderhub/setup-binderhub.html#accessing-private-repositories
For a
binderhub
user that has read-only access to all repositories.And the token set as:
At the moment it's possible to have control on the launch behavior, by providing the following snippet to the helm chart config:
https://github.com/jupyterhub/binderhub/blob/b6446b12b30f741d9e82b7aec1498ede4776cd79/helm-chart/binderhub/values.yaml#L66-L119
However users can still trigger a build to a repository they do not have access to.
It looks like this could be implemented by providing a custom
RepoProvider
(in the helm config value, that could derive from an existing one).But it would require some user specific information to be passed to the
RepoProvider
to be able to decide whether or not it is possible to resolve the ref for that user, probably somewhere around this line:https://github.com/jupyterhub/binderhub/blob/72bcb59cf956f53a07f0d4b45f12cc6c1257c6cf/binderhub/builder.py#L251
A custom hook similar to the
pre_spawn_hook
oruser_redirect_hook
in JupyterHub could also help.Or how about having a
pre_build_hook
, similar to the existingpre_launch_hook
?https://github.com/jupyterhub/binderhub/blob/72bcb59cf956f53a07f0d4b45f12cc6c1257c6cf/binderhub/launcher.py#L67-L78
The
pre_build_hook
could then perform some API requests to GitHub / GitLab to check if a user has access to a specific repo.