parmentelat / nbhosting

nginx + django + docker architecture to host notebooks embedded from open-edx hosted MOOCs
Other
76 stars 8 forks source link

Foreword

Important Notices

Jupyter notebook hosting architecture

This git repo contains a collection of utilities, that together make up the architecture behind nbhosting.inria.fr that is designed as a notebook-serving infrastructure.

Use case : MOOCs

First use case is for hosting notebooks in the context of MOOCs. See e.g. on fun-mooc.fr:

The m@agistere service also uses this same infrastructure to add notebooks to their moddle-based LMS

In the classroom

In addition to this "*silent" mode, it is also possible to use it in standalone mode in the classroom; to that end, nbhosting also offers a few features to provide a thin navigation/structuring layer on top of notebook-oriented contents.


Open-edX teacher side

As far as fun-mooc/edx mode is concerned, on the edx side, teacher would create a bloc typed as ipython notebook - note that the present repo does not address the code for the edx extension that supports this type of blocs (ref?); it is readily available at this point (jan. 2017) at fun-mooc.fr; see below for enabling it on a new course.


Open-edX student side

With these settings in place, here's what a student would see;


How does it work ?

In a nutshell:

2 Additional features allow a student to:


Miscellaneous

Enabling New ipython notebook

Before you can, as a teacher, add your first notebook-backed content in your edx course, you need to enable that extension; in order to do that, go to Studio, and then in your course's SettingsAvanced, and add ipython the Avanced Module List setting, as illustrated below:

Workflow / how to publish

Workflow is entirely based on git : a course is defined from a git repo, typically remote (github, gitlab, ...) and public. In order to publish a new version of your notebooks, you need to push them to that reference repo, and then instruct nbhosting to pull the new stuff :

If you set a given course in autopull mode, nbhosting will perform this pull operation on its own every 5 minutes.

Container image

Each course is deployed based on a specific image; for customization, create a file named nbhosting/Dockerfile in your course repo. Note that some magic recipes need to be applied in your image for proper deployment, so you should start from either the nbhosting/minimal-notebook or nbhosting/scipy-notebook image; see the beginning of the code for our Python MOOC for an example.

That image can then be rebuilt from the website. The new image will be deployed incrementally, essentially as running containers get phased out when detected as inactive; this means it can take a day or two before all the students can see the upgrade.

Notebook metadata

Each notebook is displayed with a label and version number; like e.g. on the example above . For tweaking that, use your notebook's metadata and set these two items:

Statistics

Some usage statistics are available, for visually inspecting data like:

Staff

You can declare some people as being staff; this is used by nbhosting only for discarding accesses done by these people, when putting stats together. A convenience button also allows to trash all the working files for people declared as staff, which can come in handy to be sure that staff people always see the latest pushed version.

For declaring somebody as staff, you need to somehow locate that person's hash, as exposed by edx.

Jupytext

text-formats are way easier to manage under git than the historical ipynb format; for that reason, nbhosting provides full and transparent support for notebooks saved in a text-format, at least for formats known under jupytext as py:percent, py:light, markdown and md:myst.


Dataflow - nbhosting side

Here's the general principle of how things work

silent mode (in an iframe, behind a MOOC system)

Note that notebookLazyCopy used to be named ipythonExercice, which is still supported for backward compatibility.

classroom mode

The classroom mode uses a similar approach, but uses a URL that mentions notebookGitRepo/ instead of notebookLazyCopy/; the behaviour is mostly the same except for the policy used to create notebooks in the student space; when the visited notebook is missing there, notebookGitRepo triggers a git clone operation, instead of copying notebooks individually.

The advantage in this mode is that students can later on use the jupyterlab git extension to accurately manage their local repo, i.e. drop or commit local changes, pull any updates from the master repo, and so on

An experimental feature called 'pull-students' allows to deal with changes made in the master course; it allows to automatically pull these changes in the student's repo.

summary

As a summary:

TODO

See Issues on github for an up-to-date status.