Enable instructor / student accounts to be recognized by the JupyterHub authenticator

consideRatio commented 4 years ago

I want to have a list of instructors and a list of users available for the JupyterHub authenticator to allow us to give additional permission to Instructors, like read/write access to some file area where students only will get read access.

I've considered some options to do this:

Let this repo manage two encrypted text files with GitHub username, one with the instructors and one with the normal users.
We update such content now and then from an external source. We could read a google sheet, or we could ask GitHub what users have what kind of access to a certain github repo or similar and use that as an indicator.

I'm currently leaning towards the first option as I think it is the most robust option and I think that is important now.

Outcome

I made the Helm chart configurable value acl.yaml become a file mounted to /etc/jupyterhub/acl.yaml, which then is read by JupyterHub on startup thanks to some hub.extraConfig. What is within the acl.yaml configuration/file determines if the user will be allowed to login at all, and I've also demonstrated that I can adjust pod configuration based on being an admin/instructor/participant.

Below is some of the logic represented. It does not include the details of creating a k8s secret holding the acl.yaml values and mounting it to the hub pod in /etc/jupyterhub/acl.yaml.

/chart/templates/secret.yaml: takes .Values."acl.yaml" and writes it as a string to the secret file. This is later referenced as an hub.extraVolume, which is in turn referenced as an hub.extraVolumeMount to /etc/jupyterhub/acl.yaml.
/chart/values.yaml: contains the logic to read the ACL file and update c.Authenticator.admin_users and c.Authenticators.whitelist (soon name changed!)
/deployments/hub-neurohackademy-org/config/prod.yaml: contains the ACL and some logic to act based on it etc inside a pre_spawn_hook.

# JupyterHub Access Control List, not only does it decide what users will be
# allowed access, but it will also be used to influence logic in a
# pre_spawn_hook that can attach storage etc depending on user type.
acl.yaml:
  admins:
    - arokem
    - consideRatio
  instructors: []
  participants: []

jupyterhub:
  hub:
    extraConfig:
      00-acl-parsing: |
        import os
        from functools import lru_cache

        import yaml

        @lru_cache()
        def _load_acl():
            """
            Load a mounted Access Control List (ACL) from disk.

            Note that @lru_cache memoizes this function so it only runs once.
            """
            acl = {}
            path = "/etc/jupyterhub/acl.yaml"
            if os.path.exists(path):
                print(f"Loading Access Control List (ACL) from {path}")
                with open(path) as f:
                    acl = yaml.safe_load(f)
                    for group, usernames in acl.items(): 
                        acl[group] = [username.lower() for username in usernames]
            else:
                print(f"No Access Control List (ACL) at {path}")
            return acl

        c.Authenticator.admin_users = _load_acl()["admins"]
        c.Authenticator.whitelist = {
            *_load_acl()["admins"],
            *_load_acl()["instructors"],
            *_load_acl()["participants"]
        }

        # Helper functions for logic in spawner hooks etc that may want to act
        # based on this information.
        @lru_cache()
        def is_admin(username):
            return username in _load_acl()["admins"]
        @lru_cache()
        def is_instructor(username):
            return username in _load_acl()["instructors"]
        @lru_cache()
        def is_participant(username):
            return username in _load_acl()["participants"]

      spawn: |
        # Invoke logic before we spawn the user based on username
        async def pre_spawn_hook(spawner):
            username = spawner.user.name

            # Configure the pod's labels
            spawner.extra_labels.update({
                "hub.neurohackademy.org/is-admin": str(is_admin(username)).lower(),
                "hub.neurohackademy.org/is-instructor": str(is_instructor(username)).lower(),
                "hub.neurohackademy.org/is-participant": str(is_participant(username)).lower(),
            })

            # Configure the pod's container's environment variables
            spawner.environment.update({})

        c.KubeSpawner.pre_spawn_hook = pre_spawn_hook

arokem commented 4 years ago

That makes sense! For option 1 we’d have to share a gitcrypt key file between all contributors capable of adding participants/instructors? Actually, why does this have to be encrypted? Couldn’t we have these lists visible here? That way others can add to either list with a pr.

On Sun, Jun 28, 2020 at 1:21 PM Erik Sundell notifications@github.com wrote:

I want to have a list of instructors and a list of users available for the JupyterHub authenticator to allow us to give additional permission to Instructors, like read/write access to some file area where students only will get read access.

I've considered some options to do this:

Let this repo manage two encrypted text files with GitHub username, one with the instructors and one with the normal users.

We update such content now and then from an external source. We could read a google sheet, or we could ask GitHub what users have what kind of access to a certain github repo or similar and use that as an indicator.

I'm currently leaning towards the first option as I think it is the most robust option and I think that is important now.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neurohackademy/nh-2020/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA46NRIVK2UIPZA76JP5ADRY6Q5TANCNFSM4OKWACLA .

consideRatio commented 4 years ago

Ah thanks for taking me out of my tunnel vision =) Making these lists public would have plenty of accessibility benefits, I like it!

I'll start working to set up such lists without populating them though.

consideRatio commented 4 years ago

The files in https://github.com/neurohackademy/nh-2020/tree/master/chart/files/etc/jupyterhub/acl are setup for access control.

@arokem @tyarkoni, I've verified some usernames etc in the google sheet. I figure we copy paste the unique list of verified users from there to participants.txt but I'll leave that for you to do when you want to provide access.

arokem commented 4 years ago

We'll probably shoot for a few days before the course itself begins, just to give people an opportunity to kick the tires.

Still needs to make sure that we have GH user names for everyone and that the user names are correctly formatted, but let's do that after we close registration (which is tomorrow! 🎉)

consideRatio commented 4 years ago

There is also about ~100 people that have provided no username, an email instead of a username, or simply an invalid username. See cell M14 and M20 if you want to give them a heads up about that.

arokem commented 4 years ago

Yes -- thank you for adding that! I'll do that once we have all our registrants (we're still accepting registrations until tomorrow...).

consideRatio commented 4 years ago

@arokem If we want to encrypt the user list with SOPS, that's now as simple as declaring it in the prod.yaml residing in the secrets folder instead of the config folder. The access control list passed as a Helm configuration dictionary is now functional:

not being part of it means login will fail
being admin means one become a jupyterhub admin
being a admin/instructor/participant can allow us to configure the pod spec for the user in a pre_spawn_hook, as demonstrated by setting a custom label on the user pods depending on if they are admin/instructor/participants.

I'll use this functionality to add some NFS storage in read/write for instructors and read for normal users, as well as provide some spawn options allowing admins/instructors to avoid starting up more expensive servers (24GB etc) if not needed.

neurohackademy / nh2020-jupyterhub

Enable instructor / student accounts to be recognized by the JupyterHub authenticator #3

Outcome