the-turing-way / the-turing-way

Host repository for The Turing Way: a how to guide for reproducible data science
http://the-turing-way.org/
Other
1.91k stars 646 forks source link

Decide how to Authenticate Users of the Turing BinderHub #290

Closed sgibson91 closed 2 years ago

sgibson91 commented 5 years ago

Summary

Decide on the most appropriate method of authenticating users of the Turing BinderHub.

Currently have GitHub authentication working with the following config:

config:
  BinderHub:
    use_registry: true
    image_prefix: sgibson91/BHtest2-dev-
    hub_url: http://<redacted>
    auth_enabled: true

jupyterhub:
  cull:
    # don't cull authenticated users
    users: False

  hub:
    services:
      binder:
        oauth_redirect_uri: "http://<redacted>/oauth_callback"
        oauth_client_id: "binder-oauth-client-test"
    extraConfig:
      hub_extra: |
        c.JupyterHub.redirect_to_server = False

      binder: |
        from kubespawner import KubeSpawner

        class BinderSpawner(KubeSpawner):
          def start(self):
            if 'image' in self.user_options:
              # binder service sets the image spec via user options
              self.image = self.user_options['image']
            return super().start()
        c.JupyterHub.spawner_class = BinderSpawner

  singleuser:
    # to make notebook servers aware of hub
    cmd: jupyterhub-singleuser

  auth:
    type: github
    github:
      clientId: "<redacted>"
      clientSecret: "<redacted>"
      callbackUrl: "http://<redacted>/hub/oauth_callback"

Adapted from these docs:

And with the addition of the following config, GitHub users would need to be a member of the Alan Turing Institute organisation on GitHub to gain access (not tested yet). Adapted from here: https://zero-to-jupyterhub.readthedocs.io/en/latest/authentication.html#giving-access-to-organizations-on-github

auth:
    type: github
    github:
      orgWhitelist:
        - "alan-turing-institute"
    scopes:
      - "read:user"

This would allow any member of the Turing GitHub organisation to access the BinderHub and launch any public repo.

However, there is also the option to authenticate using Active Directory, see the following link:

What needs to be done?

Who can help?


Updates

07/03/2019 - Successfully got organisational authentication for BinderHub working using auth: scopes: read:org. Some reading to convince ourselves this is only accessing organisation/team memberships is required. Docs here.

martintoreilly commented 5 years ago

I think the Github organisation member check is good enough for an initial MVP. It sounds like we stand a decent chance of getting this working for the Build a BinderHub workshop, which is not possible for the Azure Active Directory option.

I vote yes for Github membership check.

martintoreilly commented 5 years ago

If we get it working in time for the workshop, can we make creation of a GittHub organisation to use for membership control part of the demo? Not all attendees will have an existing Github org and it would be great to let people spin up all the required pieces from scratch to demo this "back at the ranch".

martintoreilly commented 5 years ago

I think both Github organisation and Active Directory authentication and membership restrictions are valuable contributions to make easy to set up, so let's not lose the one we don't pick for the MVP. Let's spin out an issue for both once we pick.

sgibson91 commented 5 years ago

If we get it working in time for the workshop, can we make creation of a GittHub organisation to use for membership control part of the demo? Not all attendees will have an existing Github org and it would be great to let people spin up all the required pieces from scratch to demo this "back at the ranch".

I can absolutely try. But I have just broken it... 😞

martintoreilly commented 5 years ago

Anything I can help with?

sgibson91 commented 5 years ago

Not sure yet. I didn't get the webpage redirection I was expecting after I added the orgWhitelist and purged the User Access Token on the GitHub OAuth app. So I tried generating a new clientId and clientSecret, still no luck. So I removed the orgWhitelist and couldn't get back to sign in with any GitHub account. I think I'm going to come back at it with fresh eyes and brain in the morning.

sgibson91 commented 5 years ago

I got standard GitHub sign-in working again, I think I had an extra slash in my homepage URL when setting up the GitHub OAuth app. I get a nice little authentication page and then I get my own username on JupyterHub to work in.

github_oauth

github_user

sgibson91 commented 5 years ago

I would like to get this working with HTTPS connection as well.

sgibson91 commented 5 years ago

If we want to really control who we allow access the BinderHub, we could change auth: scopes: - "read:user" to auth: scopes: - "read:org" in config.yaml, which would mean users would have to request access from the owner of the GitHub organisation and they would verify over email. See zero-to-jupyterhub-k8s/issues/687.

This is potentially overkill, but just documenting that it's an option.

sgibson91 commented 5 years ago

Updated config.yaml to require organisation login.

Experiments:

sgibson91 commented 5 years ago

Useful commands for accessing the JupyterHub logs:

Output of JupyterHub logs for a failed authentication using read:user scope: This claims the user is not a member of the GitHub organisation when in fact they are.

[W 2019-03-07 15:43:16.919 JupyterHub github:149] User <redacted> is not in org whitelist
[W 2019-03-07 15:43:16.920 JupyterHub base:504] Failed login for unknown user
[W 2019-03-07 15:43:16.922 JupyterHub log:158] 403 GET /hub/oauth_callback?code=[secret]&state=[secret] (@10.244.0.1) 325.12ms
sgibson91 commented 5 years ago

On the advice of the Binder team, tried switching read:user to read:org. Apparently, requesting access for the app to read users organisation membership is only required once.

People who accessed the binder page after the app was granted access by the alan-turing-instute organisation were me, @LouiseABowler and @KirstieJane. Whereas @r-j-arnold was denied access. So it seems that the filtering of membership is working as expected.

I think further exploration is required into exactly what permissions are being granted here. The approval email from alan-turing-institute reads:

An organization owner has approved the following application to access private data in the The Alan Turing Institute organization: . Because you are a member of the The Alan Turing Institute organization and have already granted this application access to your personal account, this app can now access The Alan Turing Institute organization resources on your behalf.

What does organization resources mean here?

The authorisation step from my end read:

Read only access. This application will be able to read your organization and team membership.

Documentation for GitHub OAuth scopes here.

martintoreilly commented 5 years ago

Doing sone digging on the Github users API, I feel we should be able to read a user's org memberships using https://api.github.com/users/<username>/orgs with the user:read scope. Let's experiment and find out.

sgibson91 commented 5 years ago

Tried to write an access token to my GitHub user account using the following:

curl -H 'Content-Type: application/json' -H 'Accept: application/json' -X PUT -u sgibson91 https://api.github.com/authorizations/client/<redacted> -d '{"client_secret": "<redacted>", "scopes": ["read:user"]}'

Result:

{
  "message": "Not Found",
  "documentation_url": "https://developer.github.com/v3"
}
manics commented 5 years ago

These two discussions on GitHub org scopes might be helpful:

sgibson91 commented 5 years ago

Test ran with @nbarlowATI : GitHub OAuth app owned by the whitelisted organisation (binderhub-test-org in this case). Scope "read:user" will work if people's membership to the GitHub organisation is public, but not if it's private (we'd need "read:org" scope). @nbarlowATI could log in with public organisation visibility, I could not with private.

What are the downsides of asking users who wish to use the BinderHub to have their membership be public? What reasons would they not want their membership to be public? (I feel like there are good ones, they're just not coming to me.)

The token, once granted to the user, remains so long as neither the user revokes the access of the app and the owner of the app doesn't revoke all user access tokens (i.e. forcing the authentication flow again). So a user would only need to have their membership be public the first time they log on to the BinderHub and could be reverted afterwards.

sgibson91 commented 5 years ago

image

This suggests that using "read:user" should work for private organisation visibility providing the app is owned by the same organisation, but the test above disproves this.

Update: This only means that the organisation owner doesn't have to approve access of the app, "read:org" is still required.

manics commented 5 years ago

What are the downsides of asking users who wish to use the BinderHub to have their membership be public?

I did this with a publicly accessible jupyterhub, the biggest problem was ensuring people could find the relevant setting to switch to public, it's not very intuitive: https://help.github.com/en/articles/publicizing-or-hiding-organization-membership

sgibson91 commented 5 years ago

I did this with a publicly accessible jupyterhub, the biggest problem was ensuring people could find the relevant setting to switch to public, it's not very intuitive: https://help.github.com/en/articles/publicizing-or-hiding-organization-membership

Yes, it is awkward to find. I'm just struggling to find a reasonable middle-ground between "read:user" not working for private memberships and the private/Third party access requirements for "read:org".

KirstieJane commented 5 years ago

(I haven't read this whole issue, I'm just swinging by in the middle of a slack meeting to say that I think having public membership is a fine price to pay for accessing Hub23!)

sgibson91 commented 5 years ago

This is interesting https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/886

https://github.com/InfuseAI/primehub

sgibson91 commented 5 years ago

See #410 Solved.

sgibson91 commented 2 years ago

Plan is to transform the current Turing BinderHub into a JupyterHub, since that is probably more useful to a research community than a BinderHub, and it will use Azure Active Directory authentication so auth will be provided via a turing.ac.uk account, rather than relying on the GitHub org membership model which has dodgy permission scopes.