jupyter / nbgrader

A system for assigning and grading notebooks
https://nbgrader.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.3k stars 317 forks source link

Multiple instructors, multiple courses, and multiple solutions #1030

Closed perllaghu closed 5 years ago

perllaghu commented 6 years ago

This is a discussion-point, to try and bring together the various threads

We have a number of similar threads on multiple courses, multiple instructors, and the possibility of an external exchange service.

We all want to use NBGrader, as it's an excellent tool for creating tests & grading the responses.

It would be good to get a common solution, or at least several non-exclusive solutions, for these issues.

Please could people describe what they are trying to achieve, and the environment in wich they are running nbgrader?

jhamrick commented 6 years ago

Great idea, thanks for creating this thread. Maybe let's explicitly ping some of the other people who have been thinking about this lately or might be interested in this discussion: @FranLucchini @rkdarst @sgvasquez @fishfree @zonca @sigurdurb @yuvipanda @dsblank

(Please feel free to reference any additional issues/PRs if I missed anything)

perllaghu commented 6 years ago

Environment

Aims

We want users identified as instructors to have access to formgrader whilst everyone else gets assignments [done]. We need to cope with multiple classes, and cannot have students able to "borrow" another students submission. [doing]

Current Plan

Our current plan is to dynamically create nbgrader_config .py at notebook startup, and set CourseDirectory.root to /courses/<course_id> - this will separate each course into it's own namespace

For the exchange stuff, the notebook Docker images will volume-mount:

For outbound:

/srv/nbgrader/exchange/outbound -> [external_stroage_point]/<course_id>/outbound

For inbound, it depends on whether you're an instructor or a student (this is what stops students seeing each others work):

/srv/nbgrader/exchange/inbound -> /disk/remote/courses/<course_id>/inbound   # Instructors
/srv/nbgrader/exchange/inbound -> /disk/remote/courses/<course_id>/inbound/<student_id>  # students

However, this does mean we need to modify the collect code to iterate over every student directory

Plan, stage 2

Future work is to either adopt hubshare [https://github.com/jupyterhub/hubshare], or write our own version. We would also push for a central nbgrader.db, and add the functionality for multiple courses & the like to it... and an API to allow instructors/students to work any course they've previously been associated with, and for the inbound/outbound stuff to all of interacted with through an API (which then does authorization with this central database)

perllaghu commented 6 years ago

Also related: #1010

sivasquez commented 6 years ago

Enviroment

Currently using the basic system, and we intend to use deployment with either Docker or Kubernetes in a future.

Aims

Current Plans

Based on the solution proposed in this issue, we intend to create an external service (we are considering using Flask for this) that allows users to query for the courses they are in a student, the courses they are in as graders, and the students they have in a given course if they are graders in said course. This separation will make it easier to work with another systems if we plan to integrate more systems to JHub/Nbgrader.

This would also include changes to the UI, but its implementation is dependent on whether we can do the backend changes - or not.

We'd like to ask for guidance on where to look at the files to start making changes. Personally, I've been toying around trying to change how Nbgrader manages the course_id, but with so many uses of the variables in so many different files is not easy to just go and edit the code. Any insight would be appreciated :)

dsblank commented 6 years ago

Rather than an external service, you can build on JupyterHub's services. See for example these two services:

https://github.com/BrynMawrCollege/jupyterhub

You can make sure that they are logged in, and then perform the various functions (retrieving class rosters for the teacher, etc.). Sounds like a great idea!

sivasquez commented 6 years ago

@dsblank Thanks for the feedback! We are planning to work on this ASAP, although I haven't worked with services before. Is there any guide on how to make good services? I'm thinking that for our solution (I'm working with @FranLucchini in this), we'll need to be able to pass parameters (current user id, course id, student ids in case of a grader) and receive responses to/from the service dinamically that will affect what the UI shows.

dsblank commented 6 years ago

The main docs are here: https://jupyterhub.readthedocs.io/en/stable/reference/services.html

It is just a regular web endpoint, so you should be able to do whatever you want. You will be able to check to see if the user is logged in, and get their user_id, for example.

fishfree commented 6 years ago

@perllaghu I think integration with LTI is very necessary, for there are already many ID providers, especially in Learning Management Systems.

perllaghu commented 6 years ago

@fishfree - remember that NBGrader itself has no need to know about LTI .... however it would be good if it coukd work in a JupyterHub environment driven by LTI connections.

perllaghu commented 6 years ago

I was thinking about this over the weekend.... and I think there are three distinct, but inter-related, parts that need to be resolved:

  1. The formgrader and assignements functions need to be able to cope with multiple courses (and, consequently, multiple roles)
  2. Abstract the exchange functionality and move towards an API-based system (such as hubshare.)
  3. Acknowledge LTI as a potential access method for a Notebooks/Jupyterhub service

Taking the LTI issue first: this should not be something for NBGrader to concern itself with, but for some _prestart hook to manage. It's impact on NBGrader is that users, courses, and roles (students v's instructors) is not known at the time the jupyterhub service is created, so needs to be managed in the pre-start hook area.

The first issue, of multiple courses, seems to have a solution of extending the Jupyterhub User model so one can use that to get a list of courses for each user.

This leaves us with modifying the exchange, and the use of hubshare (https://github.com/jupyterhub/hubshare) - what is the state of this work?

What other issues are there that I've missed?

moschlar commented 6 years ago

Environment

- Special Helm chart values to make it work:
```yaml
...
  hub: 
    extraConfig: 
      environment: |-
        import string
        safe_chars = set(string.ascii_lowercase + string.digits)
        c.KubeSpawner.environment = {}
        c.KubeSpawner.environment["NB_UID"] = lambda self: str(1000 + int(self.user.id))
        c.KubeSpawner.environment["NB_USER"] = lambda self: ''.join([s if s in safe_chars else '-' for s in self.user.name.lower()])"
      priv: "c.KubeSpawner.privileged = True"
      uid: "c.KubeSpawner.uid = 0"
...
  singleuser: 
    cmd: "start-singleuser.sh"
    image: 
      name: ".../singleuser"
      tag: "..."
    storage: 
      dynamic: 
        storageClass: "nfs-client"
      extraVolumeMounts: 
        - 
          mountPath: "/srv/nbgrader/exchange"
          name: "exchange"
      extraVolumes: 
        - 
          name: "exchange"
          persistentVolumeClaim: 
            claimName: "exchange"
      homeMountPath: "/home"

Aims

perllaghu commented 6 years ago

@moschlar - you have an environment very similar to ours... except we have multiple "single-user" notebooks, and an external service that identifies the user & which notebook they're wanting to use.

How are you managing persistent storage for your users?

moschlar commented 6 years ago

@perllaghu

How are you managing persistent storage for your users?

We have central NFS storage (NetApp appliance) which exports a volume (with sec=sys) for the Kubernetes cluster. And then we use nfs-client-provisioner to provision directories for each PersistentVolumeClaim. Seems to work so far!

perllaghu commented 6 years ago

@moschlar Very similar to us - and we're at >14,000 users, and I've seen >150 simultaneously

We've squashed the NFS permissions, so all files on disk are owned by jovyan and mount just the user's space into the notebook docker image (seemed easier than mucking about with UIDs in notebooks, and the security issue is fairly low)

Our problem is the NFS mounting for the exchange [for multiple courses] - it's just not proving feasable, hence the comment above about needing an exchange service to manage that.

moschlar commented 6 years ago

Today was the first lecture and we had ~160 simultaneous users (should be ~300, if the wifi would work :grimacing:).

We currently only have one course so that problem didn't come up yet...

But while reading through the docs and source code, I made a mental note that we could maybe solve it using the Spawner options form, allowing the students to select a course while spawning their Notebook server. Did you consider that?

perllaghu commented 6 years ago

That's how we did it..... and when I have time, I'll find the bits of code we have :) (though note that our code is in an eternal front-end, which tells the hub what to spawn, and for whom)

perllaghu commented 6 years ago

@moschlar - this is a cut-down version of the options dict we pass to the kubespawner in JupyterHub.

  # users are DJango users, Containers is a table of available notebooks
  # self is a user-object, pass in container table PrimaryKey
  def jupyter_start(self, container_pk):
      containers = Container.objects.all()
      container = containers.get(pk=container_pk)

      # Define volume mount from NFS mount
      mounts = [{
          'name': 'home',
          'hostPath': '/disk/remote{}'.format(self.user_directory),
          'mountPath': '/home/jovyan'
      }]

      # set up the options dictionary
      user_options = {
          # list of constraints
          'image': container.service_name,  # name of the image
          'labels': {
              'username': self.username,
              'container': container.display_name.lower().replace(' ', '_'),
              'type': 'naas'
          },
          'volumes': mounts
      }
      return requests.post('http://{}/hub/api/users/{}/server'.format(settings.JUPYTER_URL, self.username),
                             headers={'Authorization': 'token ' + self.jupyter_token}, data=json.dumps(user_options), verify=False)

  def jupyter_shutdown(self):
      return requests.delete('http://{}/hub/api/users/{}/server'.format(settings.JUPYTER_URL, self.username),
                               headers={'Authorization': 'token ' + self.jupyter_token}, verify=False)
alphadevuser commented 5 years ago

Hi. Does nbgrader support multiple courses now ? I am interested in integrating nbgrader with JupytrHub such that a single installation of JupyterHub can support multiple courses and multiple instrcutors.

sigurdurb commented 5 years ago

@alphadevuser One way of having multiple instructors is just by allowing them access to the service where you are running the course from. As for multiple courses for each student I am already working on #1040 . Let me know if you are going to work on it, I have not taken the time to refactor it and test in some time, fresh eyes are always welcome.

alphadevuser commented 5 years ago

Hi @sigurdurb ! Thanks for the response. I am working on a personal project, and unfortunately won't be able to dedicate time on this codebase. I really appreciate the effort you are putting in. Hopefully, I would be able to contribute soon.

Question on #1040 : This doesn't seem support the functionality to create courses on the fly. I mean you can add a new course when starting the jupyterhub server only. When the server is up, one can't create courses. Am I right ?

perllaghu commented 5 years ago

There's a workshop coming up in Edinburgh soon.... and I'm hoping we can address this.

It ties in with using LTI connections, with department-/campus-wide jupyterhub installations, and with a couple of other issues

rkdarst commented 5 years ago

Aalto University environment

I'm late, but here's what we have...

Multiple courses

Multiple users

Multiple instructors

All of this already works and many courses have used it. Some improvements could make things smoother, but really most work we do is local integration. If formgrader would support multiple courses, it could be designed so that you don't have to select the course at spawn-time... though this may still be needed to select the right container image.

Student and instructor UI (visible options depend on who is logged in): image

jhamrick commented 5 years ago

Hi all, I wanted to give a quick update on this issue in light of what we've been working on at the Edinburgh hackathon! We made the decision to make nbgrader more compatible with multiple courses, with the following changes:

Note that even with all these changes, we are still making the following assumptions:

I plan to address these at some point which will make the setup for multiple classes even smoother, but probably not immediately. But I hope that the changes we've made so far will make having multiple classes a much nicer experience than it is currently!

Huge thanks to @sigurdurb @BertR and @damianavila who worked on this at the hackathon!! 🥇

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jupyter-nbgrader-hackathon-topics/1040/1