jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.55k stars 797 forks source link

Offer of UC Davis student help for JupyterHub/nbgrader #1556

Closed willingc closed 4 years ago

willingc commented 4 years ago

Surfacing the following as a new issue.

We have a kubernetes based JupyterHub deployment at UC Davis (on bare metal) and would like to get nbgrader running on the system for instructors to use. I have a team of four CS seniors that will spend about 40 collective hours a week for 5 months with the goal to develop a community usable solution for this issue. They need some help getting oriented to understand what the current state of affairs is, what the issues are, and what solution paths have people considered. Any suggestions on how to move forward? They start this week :)

Originally posted by @moorepants in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/174#issuecomment-574978242

willingc commented 4 years ago

@minrk @GeorgianaElena @betatim This generous offer from @moorepants would be great to take Jason up on. Do you think that this is something that you could help get them started on? It would be great for those that use nbgrader. cc/ @jhamrick

BertR commented 4 years ago

Hi @willingc , we (@perllaghu, @lzach and myself) are running a similar setup with nbgrader at The University of Edinburgh. Most of our work has been on making the "Exchange" functionality of nbgrader pluggable, so that assignment can be send of the network instead of a local filesystem. Happy to talk about our approach and ideas on how we could take this forward.

consideRatio commented 4 years ago

I'd be happy to review ideas on implementation strategies in conjunction with a cloud based setup!

Thinking about it now, an idea pops up inspired by seeing nice results of @GeorgianaElena and others work on jupyterhub/traefik-proxy in conjunction with JupyterHubs nice pluggable architecture. JupyterHub works with different authenticators, spawners, and proxies - all pluggable.

Would it perhaps make sense to make nbgrader pluggable wherever it would normally work directly against a filesystem? Then a REST API, Object storage, etc could potentially be used with close integration with relatively little risk of breaking things unless something quite extensive changes.

Ah anyhow, I'm curious to explore the solution space!

perllaghu commented 4 years ago

@consideRatio There's the API we had back in the summer... before nbgrader added feedback into the loop: https://github.com/jupyter/nbgrader/issues/1130#issuecomment-498600471

There's an MR awaiting approval for the pluggable exchange : https://github.com/jupyter/nbgrader/pull/1238 (its reporting it needs a couple of tweeks)

There's an action with us to move our private code into a public space - however we want to wait until the requirements for the pluggable module base has been finalised before we do that (and we need to unravel some of the context-specific code we have in there)

.... but yes - making the existing file-system exchange work as part of the ecosystem would also be very sensible (and modifying the exchange tests to test the exchange as an API)

moorepants commented 4 years ago

Thanks @willingc for making this issue. The students will jump in on the conversation soon. I was traveling this week and will get them looped in now that I'm back.

rkevin-arch commented 4 years ago

Hi everyone, sorry for the late response! I'm part of the team of 4 ( @rkevin-arch, @lxylxy123456, @aalmanza1998, @Lawrence37) working on integrating nbgrader into JupyterHub for our senior design project. We're still playing around with the code, getting a JupyterHub+k8s setup in a Vagrant environment, installing nbgrader, etc. Just from what we understand, we have two possible solutions:

  1. Writing an API that handles distributing notebooks to students and collecting them, kind of like hubshare. The goal would be finish the development of hubshare and integrate it into the setup.
  2. I noticed that JupyterHub requests persistent storage for all the users using a PersistentVolumeClaim to k8s. We're wondering if it's possible for an instructor container to issue additional PVC requests and mount a part of the student's storage onto theirs during the distribution/collection process. We don't have too much experience with k8s, so we aren't sure how feasible it is, but this would leverage the existing setup.

Please let us know how feasible these ideas are, and we will start implementing as soon as we get a better idea of the setup and how to proceed. Thanks!

betatim commented 4 years ago

It would be great for people to pull together the scattered knowledge/expertise/snippets around this and contribute to the documentation of Z2JH as well as code to nbgrader. I don't know if there is someone who has an overview of how it could all fit together, what the issues are, who is working on what.

I don't know much about nbgrader deployments (on a kubernetes based hub). The point about the exchange mechanism is the one I have in my head as "first thing you'd have to tackle". My (not terribly informed) knowledge about the current exchange mechanism is that you need a (globally?) writeable directory. Getting this is tricky on a kubernetes deployment and seems like something you'd want to avoid if your course is bigger than ~30 students (aka you don't know all of them personally) as it sounds like a security nightmare :-/ This means it is great to see people working on creating a different exchange mechanism.

From my perspective using something like nbgitpuller as the distribution mechanism is ideal. Or some other "copy stuff over" script installed as a postStart hook in the user pods. I don't know how this fits with the plans for a new exchange mechanism.

For autograding there are a few home made solutions out there that let students submit their notebook and get a grade back. The key challenges/features here are not executing the notebook with the privileges of the teacher but some "anonymous user" (you never know who accidentally puts !rm -rf / in their notebook) and not allowing students to exfiltrate the solutions while allowing them to run arbitrary code.

Manual grading is another task with a question mark.

Notebook authoring is probably best done by using the current nbgrader UI.

TL;DR: there are lots of areas that could need work. Especially depending on what security concerns you have. A minimal version would be to use as much of nbgrader-as-is and solve the exchange mechanism problem.


A workaround to getting a shared writeable directory is providing a PVC that can be mounted as "ReadManyWriteMany" to all of the users pods. This is easy to do on Google Cloud by using their Filestore product, I've set this up for clients and it generally seems to "just work". An alternative is to use a NFS provisioner and run your own NFS, I've never tried this but I think @dirkcgrunwald at UC Boulder has (at least he was asking questions about firewall rules and such related to this). @yuvipanda might have also attempted/done this.

consideRatio commented 4 years ago

Setting up Z2JH for nbgrader trials

I recommend minikube for you to work on a z2jh setup. kind has not been so easy to work with as I hoped. The documentation is focused on kind still though.

Kubernetes storage challenges

Kubernetes storage access modes

@rkevin-arch you wrote in point 2 about requesting storage from k8s etc. Various storage can be mounted for any pod, and on an individual basis. But, what is the requirements of the storage? A key challenge is that storage can often not be mounted simultaneously to many users, and sometimes if that is possible it may require it is ReadOnly for the users. The biggest challenge is to mount storage for many users that all of can read write to.

See this section for more information, this is essential understanding if choosing to use nbgrader as it is: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

Also, if opting for use of NFS, this issue contains a lot of relevant past discussion: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/421

Filesystem permissions within storage

Another challenge working with storage attached to drives etc, is that by default on a z2jh deployment, all users will have the user name of jovyan and user ID of 1000. So, if the same storage is mounted for two different users, they would read/write to it as the same user on a filesystem level.

I would not recommend choosing to utilize storage filesystem IDs determine access etc, I think it would end up being too complicated to develop and maintain together with z2jh.

Helpful mechanisms

User dependent customization

To attach various kinds of storage based on information about the user is very plausible. Here is a sequence of events that collects information about the user and makes adjustments based on the user.

  1. When a user user login to JupyterHub, one can request additional information, such as what groups the users are part of as kept track of by the identity system. I've used this to conclude if the user have access to GPUs or not for example. The request for a certain kind of information is often called a scope, and the response for a scope is often called a claim.
  2. With information about the user from the auth system, we can now also opt for augmenting it with a lookup somewhere else by writing some extra code within a JupyterHub hook that we ask JupyterHub to run whenever a user has logged in.
  3. With information available to us now, the user can now be presented with a choice among an environment to startup. It is possible to make the available options dependent on all the already collected information about the user. These are often referred to as profile options, and we configure them with Kubespawner. Kubespawner is the JupyterHub spawner of a z2jh deployment. Spawners are responsible for starting up the user servers. Kubespawner does this by creating a Kubernetes pod and some other things, and you can influence this in detail.

Extensible system

A lot of things can be extended, for example on the classical jupyter server, the classical jupyter notebook UI, the jupyterlab UI.

consideRatio commented 4 years ago

Action point idea - nbgrader interaction mapping

Map out how nbgrader's users interacts with a filesystem.

What permissions should various users have to various storage areas? If you overview all interactions and permissions, you can overview what needs to be made available to each user better, and then you can realize the structure of content and permissions within this structure. This is relevant no matter if you will end up using a mounted filesystem, REST API, some interactions with object storage, etc.

jgwerner commented 4 years ago

@consideRatio @betatim @willingc we got this to work at IllumiDesk albeit it doesn't use the default exchange mechanism to interchange ipynb's. We would love to share our code to demonstrate how we got this to work with JupyterHub (we are hosted on AWS so it's somewhat opinionated).

Before we share anything we would just like to confirm where to share this information to avoid unnecessary chatter.

We are definitely standing on the shoulders of giants ;-)

dirkcgrunwald commented 4 years ago

I would be happy to document what we did for an NFS setup on GKE. There were some non-obvious design choices I wish I had known about.

We are using NFS to share collected nbgrader files but not to collect the students work. We don't have a coherent way of doing that other than our LMS or Github Classroom.

We use another tool for grading non-notebook assignments ( https://inginious.readthedocs.io/en/latest/ ) which works well but doesn't have a command-line method to submit that then also updates grades in the LMS.

betatim commented 4 years ago

Before we share anything we would just like to confirm where to share this information to avoid unnecessary chatter.

I think a good place to share is the forum as a dedicated post. It provides pretty good editing (links, images, tables, bold/italic, sections, etc), receives a fair bit of traffic and doesn't require review before it is public. Then I'd link to those posts from https://zero-to-jupyterhub.readthedocs.io/en/latest/community/index.html to make them easy to discover from the Z2JH guide.

There has been some hesitation with adding new material directly to the Z2JH guide. Partly to not increase its size too much, partly because sections require constant maintenance which requires expertise in that area and partly because doing so involves reviewing work which tends to be a bottleneck. So to get started and iterate towards something that multiple parties agree is "a good way of doing this" I'd create a forum post (you can even make it "wiki editable" so others can directly edit the first post in a topic).

rkevin-arch commented 4 years ago

Hi all,

tl;dr: We've just been busy trying to learn k8s. In terms of solutions, we're probably going to go with the first option (hubshare-like API for notebook distribution/collection). We have a working testing setup where we have a private fork of nbgrader and we can test it in z2jh, but it's kind of glitchy.

Reply to betatim: I agree that the notebook distribution / collection is the main issue here. I did a bunch more research and it looks like the plan 2 I had (use PVCs to mount part of the student volumes) might not be the way to go. More details in the reply below.

Also, our team will try getting nbgrader to function first before worrying about running student code in containers / pods to prevent malicious student code from destroying the instructor's home folder or stealing other students' solutions. I've given it some thought, but the easy way out (setuid to another user, or chroot) requires the instructor to be root, which is a pretty stupid idea. We'll look into this after nbgrader functions at all.

Reply to consideRatio: Thanks for recommending minikube! This is so much better than our original vagrant setup. Also thanks for the link! I originally thought about using volumes that has one writer and many readers if synchronization is an issue, so the instructor can create a volume to distribute the notebook and mount them as read-only on students' containers, and each student would create a submission volume that the instructor would mount as read-only, but looking at the link you sent it seems like not all volumes support access modes other than ReadWriteOnce, so it'd probably be better to develop a setup that works for everyone.

I'll look into the hooks. I still don't fully understand the entire system other than a high level overview, but if we do implement a hubshare-like API, then it's probably a good idea to use hooks upon login to query whether a new assignment is available.

Also yes, we are reading the nbgrader source code to try and determine what filesystem accesses are occuring, and maybe abstract them into one interface. I'll post updates here along the way.

Updates on our testing setup, We're currently making a testing setup that installs nbgrader inside the z2jh environment. We're making a custom Docker image that installs nbgrader from our testing repo, and specifying it under singleuser.image.name in config.yaml. It works, but with many weird issues. The helm install command occasionally hangs, then gives a cryptic error saying Error: transport is closing. We've moved over to Helm 3, which still occasionally has this problem. The pod that's pulling down the containers seems stuck on downloading jupyterhub/k8s-singleuser-sample:0.8.2 occasionally, where pulling down layer 484c6d5fc38a just hangs forever. It only happens occasionally, and our best bet is to minikube delete and try again.

In addition, we are occasionally getting weird 500 errors like this one if we make updates and do a helm upgrade but without fully tearing down the minikube setup and starting fresh:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tornado/gen.py", line 589, in error_callback
        future.result()
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/handlers/base.py", line 636, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/user.py", line 489, in spawn
        raise e
      File "/usr/local/lib/python3.6/dist-packages/jupyterhub/user.py", line 409, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 1636, in _start
        events = self.events
      File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 1491, in events
        for event in self.event_reflector.events:
      File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 72, in events
        key=lambda x: x.last_timestamp,
    TypeError: '<' not supported between instances of 'datetime.datetime' and 'NoneType'

It's probably worth submitting an actual issue / bug report, but I'll do that once I understand why this is happening and can reproduce it reliably. We're still looking into the issue. For now, we're mostly just reading the nbgrader codebase and trying to understand what goes where.

Thanks for all of your patience!

consideRatio commented 4 years ago

@rkevin-arch I suggest that you build on 0.9.0-beta.3 instead of 0.8.2, it is a lot more mature and reliable to use with the latest JupyterHub 1.1.0 etc. I don't think 0.8.2 is fully ready for helm 3, but I'm quite confident 0.9.0-beta.3 should be fine with it. Also, 0.9.0-beta.3 is tested on newer k8s versions that you may be using, while 0.8.2 isnt.

Regarding the issue you have printed, I can confidently say it is due to a modern version of k8s that kubespawner is incompatible with, and that it is fixed in 0.9.0-beta.3 where we use a more modern version of kubespawner! No need to report that.

I don't know about the transport is closing issue.

consideRatio commented 4 years ago

Regarding learning k8s etc, I've spent a lot of time learning myself. These are videos I recommend my colleagues to watch:

About K8s (increasing complexity):

  1. https://www.youtube.com/watch?v=4ht22ReBjno
  2. https://www.youtube.com/watch?v=QJ4fODH6DXI
  3. https://www.youtube.com/watch?v=90kZRyPcRZw (I :heart: this)

About helm:

  1. https://www.youtube.com/watch?v=9cwjtN3gkD4

I also suggest the use of the official k8s documentation, in the Concepts section, it is excellent! https://kubernetes.io/docs/concepts/

And, if you have an, this flowchart can be really useful: https://learnk8s.io/troubleshooting-deployments

rkevin-arch commented 4 years ago

Awesome, thanks! We'll use 0.9.0-beta.3 instead.

For the transport is closing issue, I believe helm is just timing out because the pod that should pull down the docker image is hanging forever because of a weird download issue. I tried pulling that image (jupyterhub/k8s-singleuser-sample:0.8.2) on my host system and it hanged weirdly as well, with the 484c6d5fc38a1 layer downloading forever (it downloads around 3MB and hangs). For some reason I can't reproduce it now. Oh well.

Also, thanks for all the youtube links! I'll definitely be watching them this weekend.

Lawrence37 commented 4 years ago

We have decided to go down the hubshare route. We're looking through the nbgrader source code to locate filesystem interactions. The plan is to abstract them and implement classes for the existing exchange mechanism and for a hubshare (or similar) service.

I see part of this task has been worked on in jupyter/nbgrader#1238. Thanks @perllaghu for the link! I would very much like to build on the pluggable exchange and hubshare because it's likely more sustainable and universal than creating our own solution.

We are in the early stages of development, so there's room for flexibility. If anyone has comments, tips, or concerns regarding our approach, please share them with us!

moorepants commented 4 years ago

One tip: open pull requests for atomic changes (small, self contained). Open these often and early (even in the "work in progress" stage), so that you can get feedback and the CI tests will run.

perllaghu commented 4 years ago

@Lawrence37 - here's some documentation I put together to show how the assignment_list and formgrader components call the exchange (the formatting is shonky, sorry) :

Exchange API

A simplistic overview

Assignments are created, generated, released, fetched, submitted, collected, graded. Then feedback can be generated, released, and fetched.

The exchange is responsible for recieving released assignments, allowing those assignments to be fetched, accepting submissions, and allowing those submissions to be collected. It also allows feedback to be transferred.

In doing this, the exchange is the authoritative place to get a list of what's what.

Defined directories

CourseDirectory defines the following directories (and their defaults):

Also, taken from the nbgrader help::

The nbgrader application is a system for assigning and grading notebooks.
Each subcommand of this program corresponds to a different step in the
grading process. In order to facilitate the grading pipeline, nbgrader
places some constraints on how the assignments must be structured. By
default, the directory structure for the assignments must look like this:

    {nbgrader_step}/{student_id}/{assignment_id}/{notebook_id}.ipynb

where 'nbgrader_step' is the step in the nbgrader pipeline, 'student_id'
is the ID of the student, 'assignment_id' is the name of the assignment,
and 'notebook_id' is the name of the notebook (excluding the extension).

Calling exchange classes

Exchange functions are called three ways:

  1. From the command line - eg: nbgrader release_assignment assignment1.
  2. From formgrader server_extension, which generally calls the methods defined in nbgrader/apps/{foo}app.py.
  3. From the assignment_list server_extension, which generally calls the methods directly.

The classes

The nbgrader exchange uses the followng classes::

Exchange
ExchangeError
ExchangeCollect
ExchangeFetch
ExchangeFetchAssignment
ExchangeFetchFeedback
ExchangeList
ExchangeRelease
ExchangeReleaseAssignment
ExchangeReleaseFeedback
ExchangeSubmit

Exchange

Base class. Contains some required configuration parameters and elements - the prominant ones include path_includes_course and coursedir.

This class defines the following methods which are expeceted to be subclassed:

init_src() Define the location files are copied from

init_dest() Define the location files are copied to

copy_files() Actually copy the files.

The class also defines a convenience method, which may be subclassed::

def start(self):
    if sys.platform == 'win32':
        self.fail(``Sorry, the exchange is not available on Windows.``)
    if not self.coursedir.groupshared:
        # This just makes sure that directory is o+rwx.  In group shared
        # case, it is up to admins to ensure that instructors can write
        # there.
        self.ensure_root()
    self.set_timestamp()
    self.init_src()
    self.init_dest()
    self.copy_files()

You may want to subclass this, as self.root as a directory only makes sense in a file-based exchange.

ExchangeError

Does nothing in the default exchange, but available for use

ExchangeCollect

Fetches [all] submissions for a specified assignment from the exchange and puts them in the [instructors] home space.

The exchange is called thus::

    self.coursedir.assignment_id = assignment_id
    collect = ExchangeCollect(
        coursedir=self.coursedir,
        authenticator=self.authenticator,
        parent=self)
    try:
        collect.start()
    except ExchangeError:
        self.fail("nbgrader collect failed")

returns.... nothing

Expected behaviours

ExchangeFetch

(Depreciated, use ExchangeFetchAssignment)

ExchangeFetchAssignment

Gets the named assignment & puts the files in the users home space.

The nbgrader server_extension calls it thus::

with self.get_assignment_dir_config() as config:
    try:
        config = self.load_config()
        config.CourseDirectory.course_id = course_id
        config.CourseDirectory.assignment_id = assignment_id

        coursedir = CourseDirectory(config=config)
        authenticator = Authenticator(config=config)
        fetch = ExchangeFetchAssignment(
            coursedir=coursedir,
            authenticator=authenticator,
            config=config)
        fetch.start()
    .....

Returns.... nothing

Expected behaviours

The expected destination for files is {self.assignment_dir}/{self.coursedir.assignment_id} however if self.path_includes_course is True, then the location should be {self.assignment_dir}/{self.coursedir.course_id}/{self.coursedir.assignment_id}

self.coursedir.ignore is described as a::

List of file names or file globs.
Upon copying directories recursively, matching files and
directories will be ignored with a debug message.

This should be honoured.

In the default exchange, existing files are not replaced.

ExchangeFetchFeedback

This copies feedback from the exchange into the students home space.

The nbgrader server_extension calls it thus::

with self.get_assignment_dir_config() as config:
    try:
        config = self.load_config()
        config.CourseDirectory.course_id = course_id
        config.CourseDirectory.assignment_id = assignment_id

        coursedir = CourseDirectory(config=config)
        authenticator = Authenticator(config=config)
        fetch = ExchangeFetchFeedback(
            coursedir=coursedir,
            authenticator=authenticator,
            config=config)
        fetch.start()
    .....

returns.... nothing

Expected behaviours

When writing your own Exchange

ExchangeList

This class is responsible for determining what assignments are available to the user.

It has three flags to define various modes of operation:

self.remove=True If this flag is set, the assignment files (as defined below) are removed from the exchange.

self.inbound=True or self.cached=True These both refer to submitted assignments. The assignment_list plugin sets config.ExchangeList.cached = True when it queries for submitted notebooks.

neither This is released (and thus fetched) assignments.

Note that CourseDirectory and Authenticator are defined when the server_sextension assignment_list calls the lister::

with self.get_assignment_dir_config() as config:
    try:
        if course_id:
            config.CourseDirectory.course_id = course_id

        coursedir = CourseDirectory(config=config)
        authenticator = Authenticator(config=config)
        lister = ExchangeList(
            coursedir=coursedir,
            authenticator=authenticator,
            config=config)
        assignments = lister.start()
    ....

returns a List of Dicts - eg::

[
    {'course_id': 'course_2', 'assignment_id': 'car c2', 'status': 'released', 'path': '/tmp/exchange/course_2/outbound/car c2', 'notebooks': [{'notebook_id': 'Assignment', 'path': '/tmp/exchange/course_2/outbound/car c2/Assignment.ipynb'}]},
    {'course_id': 'course_2', 'assignment_id': 'tree c2', 'status': 'released', 'path': '/tmp/exchange/course_2/outbound/tree c2', 'notebooks': [{'notebook_id': 'Assignment', 'path': '/tmp/exchange/course_2/outbound/tree c2/Assignment.ipynb'}]}
]

The format and structure of this data is discussed in ExchangeList Date Return structure_ below.

Note

This gets called TWICE by the assignment_list server_extension - once for released assignments, and again for submitted assignments.

ExchangeRelease

(Depreciated, use ExchangeReleaseAssignment)

ExchangeReleaseAssignment

This should copy the assignment from the release location (normally {self.coursedir.release_directory}/{self.coursedir.assignment_id}) and copies it into the exchange service.

The class should check for the assignment existing (look in {self.coursedir.release_directory}/{self.coursedir.assignment_id}) before actually copying

The exchange is called thus::

release = ExchangeReleaseAssignment(
    coursedir=self.coursedir,
    authenticator=self.authenticator,
    parent=self)
try:
    release.start()
except ExchangeError:
    self.fail(``nbgrader release_assignment failed``)

returns.... nothing

ExchangeReleaseFeedback

This should copy all the feedback for the current assignment to the exchange.

Feedback is generated by the Instructor. From GenerateFeedbackApp::

Create HTML feedback for students after all the grading is finished.
This takes a single parameter, which is the assignment ID, and then (by
default) looks at the following directory structure:

    autograded/*/{assignment_id}/*.ipynb

from which it generates feedback the the corresponding directories
according to:

    feedback/{student_id}/{assignment_id}/{notebook_id}.html

The exchange is called thus::

release_feedback = ExchangeReleaseFeedback(
    coursedir=self.coursedir,
    authenticator=self.authenticator,
    parent=self)
try:
    release_feedback.start()
except ExchangeError:
    self.fail("nbgrader release_feedback failed")

returns..... nothing

ExchangeSubmit

This should copy the assignment from the user's work space, and make it available for instructors to collect.

The exchange is called thus::

with self.get_assignment_dir_config() as config:
    try:
        config = self.load_config()
        config.CourseDirectory.course_id = course_id
        config.CourseDirectory.assignment_id = assignment_id
        coursedir = CourseDirectory(config=config)
        authenticator = Authenticator(config=config)
        submit = ExchangeSubmit(
            coursedir=coursedir,
            authenticator=authenticator,
            config=config)
        submit.start()
    .....

The source for files to be submitted needs to match that in ExchangeFetchAssignment.

returns.... nothing

When writing your own Exchange

ExchangeList Date Return structure

As mentioned in the ExchangeList_ class documentation above, this data is returned as a List of Dicts.

The format of the Dicts vary depending on the type of assignments being listed.

Removed

Returns a list of assignments formatted as below (whether they are released or submitted), but with the status set to removed

Released & Submitted

  1. The first step is to loop through a list of assignments (lets call each one a path) and get some basic data:
  1. We then add status and path information:
    if self.inbound or self.cached:
        info['status'] = 'submitted'
        info['path'] = path  # ie, where it is in the exchange
    elif os.path.exists(assignment_dir):
        info['status'] = 'fetched'
        info['path'] = os.path.abspath(assignment_dir)  # ie, where it in on the students home space.
    else:
        info['status'] = 'released'
        info['path'] = path # again, where it is in the exchange

    if self.remove:
        info['status'] = 'removed'
        # Note, no path - it's been deleted.

(assignment_dir is the directory in the students home space, so needs to take into account self.path_includes_course)

  1. Next loop through all the notebooks in the path, and get some basic data::
       nb_info = {'notebook_id': /name, less extension/, 'path': /path_to_file/}
  1. If the notebook is info['status'] != 'submitted': that's all the data we have::
        info['notebooks'].append(nb_info)
else, add *feedback* details for *this* notebook::
        nb_info['has_local_feedback'] = _has_local_feedback()
        nb_info['has_exchange_feedback'] = _has_exchange_feedback()
        if nb_info['has_local_feedback']:
            nb_info['local_feedback_path'] = _local_feedback_path()
        if nb_info['has_local_feedback'] and nb_info['has_exchange_feedback']:
            nb_info['feedback_updated'] = _exchange_feedback_checksum() !=
                    _local_feedback_checksum()
        info['notebooks'].append(nb_info)
  1. Having looped through all notebooks

    If info['status'] == 'submitted', add feedback notes to the top-level assignment record::

        info['has_local_feedback'] = _any_local_feedback()
        info['has_exchange_feedback'] = _any_exchange_feedback()
        info['feedback_updated'] = _any_feedback_updated()
        if info['has_local_feedback']:
            info['local_feedback_path'] = os.path.join(
                assignment_dir, 'feedback', info['timestamp'])
        else:
            info['local_feedback_path'] = None
rkevin-arch commented 4 years ago

Hey everyone, a quick update: Our team has pretty much split into two parts. Some of our team members is working on an exchange based on https://github.com/jupyter/nbgrader/pull/1238 to talk to a JupyterHub service, and others (like me) are working on writing a JupyterHub service that implements the API described in https://github.com/jupyter/nbgrader/issues/659. We're using Tornado and SQLAlchemy for the API, since that's used by the Jupyter community in many places already. We have some very basic functionality finished, but we are having trouble testing in a Kubernetes cluster.

The problem we're having is about running a JupyterHub service with a public URL in k8s. I've been digging through the Helm chart and see how the culler is implemented as a service, and I tried to replicate that, but I'm not sure what to specify as the URL. We tried the standard http://127.0.0.1:10101 which works with regular JupyterHub, but on Z2JH it gives a 503 error. Looking at the logs I think the issue is that the proxy and the hub are two different pods. Running the jupyterhub service will run it in the hub, while the proxy still thinks the right IP to connect to is 127.0.0.1 which is not the hub. I also tried using something like 'http://%s:10101'%c.JupyterHub.hub_connect_ip, but tornado refuses to listen on that IP (Cannot assign requested address). I can change it to listen on 0.0.0.0, but requests to it just hangs because it can't access JupyterHub's API (I confirmed this since the @authenticated decorator is what's making it hang, and printing out the JUPYTERHUB_API_URL environment variable gives me something like http://10.96.192.15:8081/hub/api, and curling that URL gives me a timeout inside the hub container). In the meantime, visiting the API from the website through the proxy just gives me 404s that I can't trace back to the original issue. If anyone has experience running JupyterHub services with public URLs in Z2JH, or have any idea how to make this work, input is greatly appreciated.

On an unrelated note, does anyone have suggestions on how to deal with persistent storage? For now, we're just putting them in the working directory. For the z2jh case, we're just putting it in the same place where jupyterhub.sqlite is put (the hub's persistent volume). Would this be a good place to put it? Is it better practice to request another piece of storage for the nbgrader stuff so it doesn't mix with the hub's data? Also, is it even a good idea to run the JupyterHub service inside the hub pod, or should we start up another pod instead? (In that case it won't be a hub managed service, it will run standalone from the hub) Thanks!

lzach commented 4 years ago

Hi Kevin, I'm the one who wrote jupyter/nbgrader#1238. We're still testing the new interface and I might push some minor changes as we find issues on our side, so you might want to check the pull request for changes every now and then. If you run into any problems, feel free to send me a message.

manics commented 4 years ago

@rkevin-arch Sorry if I've missed it in this thread. Do you have your z2jh config and custom images (is it a Z2JH fork?) in a public repo that we can look at? We can probably make some general suggestions, but if we can see it we can hopefully give you more targetted advice.

rkevin-arch commented 4 years ago

Sorry, it's not in a public repo yet since we don't have the main functionality finished yet. For the Z2JH service setup, I can't even get the simple whoami service (https://github.com/jupyterhub/jupyterhub/tree/master/examples/service-whoami, which is tested to work on regular jupyterhub) to work on Z2JH. Here's my config:

# https://zero-to-jupyterhub.readthedocs.io/en/latest/reference/reference.html
proxy:
  secretToken: "<REDACTED>"
  https:
    enabled: false
singleuser:
  memory:
    limit: 512M
    guarantee: 128M
auth:
  admin:
    users:
      - rkevin
hub:
  image:
    name: k8s-hub-testing
    tag: '0.0.1'
  #services:
  #  whoami:
  #    url: 'http://127.0.0.1:10101'
  #    command: python3 /etc/jupyterhub/whoami.py
  extraConfig:
    test.py: |
      c.JupyterHub.services.append({
        'name': 'whoami',
        #'url': 'http://127.0.0.1:10101',
        'url': 'http://%s:10101'%c.JupyterHub.hub_connect_ip,
        'command': ['python3', '/etc/jupyterhub/whoami.py']
      })

Here's a Dockerfile that will get built into k8s-hub-testing:0.0.1:

FROM jupyterhub/k8s-hub:0.9.0-beta.3.n027.hb7da682
COPY whoami.py /etc/jupyterhub/

The whoami.py has not been changed. Our testing script:

eval $(minikube docker-env)
cd hub-testing-image
docker build -t k8s-hub-testing:0.0.1 .
cd ..
eval $(minikube docker-env -u)
echo "This will take a while with no output, please be patient..."
helm install jhub jupyterhub/jupyterhub -f config.yaml --version 0.9.0-beta.3
minikube service list
moorepants commented 4 years ago

@rkevin-arch I just wanted to comment on your statement "Sorry, it's not in a public repo yet since we don't have the main functionality finished yet." I would like you all to work in public repos starting now, so that you can effectively get help. It's best to get into the same practices and workflows as ngbrader, jupyterhub, etc.

consideRatio commented 4 years ago

To iterate faster, you can actually do this without building your docker image again after changing the whoami.py file, by mounting it in k8s.

Im one mobile, aim to give an example later on this.

rkevin-arch commented 4 years ago

Re: consideRatio, is there a way to restart the service managed by the hub without doing a helm upgrade or helm uninstall / install? If not, mounting the file inside k8s won't save much time, since the docker build finishes within a second (it's just copying one directory). If so, how can we do that?

Re: moorepants, our repo is a big mess, with lots of testing setups and throwaway scripts littered around. We'll either clean up a bit and make it public, or start a new public repo and start developing there.

moorepants commented 4 years ago

It is ok if it is a mess. The only thing you don't want public in this case are any private credentials. I recommend committing what you have (sans credentials) and working openly and collaboratively on github.

consideRatio commented 4 years ago

@rkevin-arch ah, i understand, then let's have it remain in this way - the option is more complicated!

I understand, once you update the docker image, you may want to update jupyterhub! You could do this:

  1. Values to be passed to helm when running helm upgrade, for example from a file that we often refer to as config.yaml.

    hub:
    image:
    # always re-pull the image even if it has the same
    # tag as already exists since before on the node
    pullPolicy: Always
  2. A terminal command to delete the jupyterhub pod running the container. Since this pod was created by a k8s ReplicaSet, which in turn was created by a Deployment, it will be automatically recreated if it is deleted.

    # ps: the --selector has a short form of -l, and component=hub is a pod label and label value
    kubectl delete pod --namespace my-jhub-namespace --selector component=hub
lxylxy123456 commented 4 years ago

Hi, we have opened source the service side of our project at ngshare repo. Please fell free to take a look and give some feedbacks.

The database specification is almost complete using SQL Alchemy.

There are two versions of servers: vserver is a simple server using flask and ngshare which will be a JupyterHub service. Currently most APIs are to be implemented.

lxylxy123456 commented 4 years ago

We have basically completed the development for the API logic and a standalone backend server (vserver). Would you like to try it following https://github.com/lxylxy123456/ngshare#installation-and-setup and provide any comments and suggestions?

We also wrote the plan for the entire project structure in the same markdown file.

rkevin-arch commented 4 years ago

Hi,

I have been trying to make a JupyterHub managed service exposed to the proxy pod for a Z2JH setup. There are some very weird pod has unbound immediate PersistentVolumeClaim errors. I believe the hub pod keeps crashing (that's the only explanation for the occasional weird 404 errors, "Service Unavailable" messages, and the "MinimumReplicasAvailable" transition time to always be less than a minute on the minikube dashboard). The service does work when the hub pod happens to be online, and the persistent storage also seems to work if I examine things using kubectl exec -it hub-* bash.

I have edited the helm chart as follows: https://github.com/rkevin-arch/zero-to-jupyterhub-k8s/commit/e34a61038867fa8cf67096ffd13721b20bd32cd0 and our current testing setup is here: https://github.com/lxylxy123456/ngshare/tree/master/testing/minikube. Does anyone know what possible reason could have caused this issue? kubectl logs hub-* only show regular web traffic, except for the occasional Cannot connect to managed service ngshare at http://10.97.94.52:10101. kubectl describe pod hub* gives me:

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  21m                 default-scheduler  error while running "VolumeBinding" filter plugin for pod "hub-747587d5b6-ln86t": pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled         21m                 default-scheduler  Successfully assigned default/hub-747587d5b6-ln86t to minikube
  Warning  FailedScheduling  21m                 default-scheduler  AssumePod failed: pod b7aee470-6a45-470b-bee9-90763716abba is in the cache, so can't be assumed
  Warning  Unhealthy         21m                 kubelet, minikube  Readiness probe failed: Get http://172.17.0.4:8081/hub/health: dial tcp 172.17.0.4:8081: connect: connection refused
  Warning  BackOff           21m                 kubelet, minikube  Back-off restarting failed container
  Normal   Pulled            21m (x3 over 21m)   kubelet, minikube  Container image "hub-testing:0.0.1" already present on machine
  Normal   Created           21m (x3 over 21m)   kubelet, minikube  Created container hub
  Normal   Started           21m (x3 over 21m)   kubelet, minikube  Started container hub
  Warning  Unhealthy         93s (x56 over 17m)  kubelet, minikube  Readiness probe failed: Get http://172.17.0.4:8081/hub/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Surprisingly, kubectl get pvc thinks everything's normal, even during the brief period when the hub is down:

NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
claim-pip         Bound    pvc-9cb6827f-b042-44e4-938f-c810ded02b9b   10Gi       RWO            standard       23m
hub-db-dir        Bound    pvc-34d155f8-75c8-4aa4-8216-b18627a4d9b1   1Gi        RWO            standard       23m
hub-ngshare-dir   Bound    pvc-4901ef29-1b07-4211-abad-5144e1ec86e7   1Gi        RWO            standard       23m

I'm not sure what's going on. Any ideas?

consideRatio commented 4 years ago

Service unavailable is happening when accessing the k8s Service representing the hub pods (only one though). If the hub pod isnt considered ready, the k8s service will return "Service unavaialble".

Hmmm, running on minikube, what is your config.yaml passed to helm upgrade?

consideRatio commented 4 years ago

If some logic made the hub pod unresponsive for a while, it would fail to respond to /hub/health, and would show Service unavailable btw.

Can a JH service running on the hub pod that get stuck in a loop (or awaiting a network call and times out after 1 min) cause the hub to get inresponsive?

rkevin-arch commented 4 years ago

Not sure. We have modified the Helm chart, and that's probably causing the issue, but I'm very new to Helm so I'm not sure what changes could have caused the problem. The config.yaml is here, although we have changed the Helm chart to add our custom stuff here.

betatim commented 4 years ago

Is the log message Cannot connect to managed service ngshare at http://10.97.94.52:10101 from the hub process?

I would investigate in the direction that Erik suggested: is your hub running properly or is it getting stuck somewhere. For example what does JupyterHub do when it can't start a managed service, does a managed service run as a separate process, things like that. You are looking for a reason that the hub stops responding "I am happy" on the health endpoint (which is why kubernetes ends up telling you the service is unavailable).

ps. this thread is getting very long and starting to cover various topics. What do people think of moving the discussion to the forum and starting separate threads for each topic? For example debugging why adding a service makes the hub not-ready is a general thing that others in the forum might have experience with or be interested in readying about.

consideRatio commented 4 years ago

I started watching https://github.com/lxylxy123456/ngshare

manics commented 4 years ago

I think splitting into topics is a good idea. There are several other issues that could come out of this, for instance can something be done in the Z2JH chart to make it easier to integrate applications such as this without modifying the chart.

manics commented 4 years ago

Can a JH service running on the hub pod that get stuck in a loop (or awaiting a network call and times out after 1 min) cause the hub to get inresponsive?

Yes it can! See:

rkevin-arch commented 4 years ago

Re: betatim, that log message is coming from the proxy. Inside a regular singleuser image we can still talk to the hub and the service, which is why I thought the hub is still working and the proxy is just not seeing it for some reason.

Re: manics, I'm actually surprised that's the case, since I thought JH starts the service as a completely separate process and will also automatically restart it if it dies. I'll definitely look more into that.

rkevin-arch commented 4 years ago

I think I have finally figured out the issue after pulling my hair out, and it's kinda complicated. The problem is starting the service as a JupyterHub managed service means that JupyterHub itself will also occasionally poll the service to make sure it's up (https://github.com/jupyterhub/jupyterhub/blob/0427f8090fed143d7bf2b75cf6c35a3acea19557/jupyterhub/app.py#L1932). However, requests to the service is routed through the proxy.

If I make the URL for the managed service http://127.0.0.1:10101, then the hub will see the service, but the proxy will proxy the request to localhost, which is the proxy, not the hub (they're two different pods). Therefore, requests to it will fail.

If I make the URL for the managed service 'http://%s:10101'%os.environ['HUB_SERVICE_HOST'], then the proxy can route requests correctly, and the service will work for a little bit. However, for some reason, the hub cannot access its own IP address inside the pod itself. Connecting to http://10.111.133.52:10101 in the hub will result in a timeout despite the pod's address being 10.111.133.52, and all singleuser pods and the proxy pod can see it perfectly. The hub thinks the service is dead because it's not responding, and despite using tornado, it is actually hanging, just like https://github.com/jupyterhub/jupyterhub/issues/2928#issuecomment-591987199 (thanks for the link BTW! Would've never figured it out without it).

This log snippet demonstrates this is actually the case:

[I 2020-03-01 08:52:57.141 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 1.43ms
[I 2020-03-01 08:53:07.144 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 1.82ms
[W 2020-03-01 08:53:38.732 JupyterHub app:1903] Cannot connect to managed service ngshare at http://10.108.93.19:10101
[I 2020-03-01 08:53:38.736 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 1.48ms
[I 2020-03-01 08:53:38.737 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 2.51ms
[I 2020-03-01 08:53:38.738 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 2.75ms
[I 2020-03-01 08:53:38.742 JupyterHub proxy:320] Checking routes
[I 2020-03-01 08:53:52.203 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 0.47ms

Between 08:53:08 and 08:53:38, it's likely hanging at the check, and not responding to requests to the health endpoint. I'll play around with either removing that check or finding a workaround and ask it to check localhost rather than the URL intended for the proxy.

Thanks so much for the links and replies!

ryanlovett commented 4 years ago

@rkevin-arch Could you create a Kubernetes Service instead of the URLs you've crafted?

perllaghu commented 4 years ago

Here's where we are.... I still need to tidy a few things (like make the tests pass again) & update docs a bit more.... but we're getting there https://github.com/edina/nbexchange

rkevin-arch commented 4 years ago

Hi all, hope you're safe during this pandemic. Sorry for the silence for quite a while, we're busy with finals, but rest assured we're making some decent progress.

Re: ryanlovett, the main reason I didn't originally do this is because JupyterHub only supports spawning managed services as a subprocess, not a Kubernetes pod. If you use an external service then you have to manage API tokens and URLs and those kinds of stuff.

However, that's what I've been working on in the past few weeks. I've modified JupyterHub to spawn managed services as k8s pods. You can see the repo here, and here's a screenshot of it working (you can see the jupyter-service-ngshare pod spawned by the hub on startup, and the service being accessible through the proxy.

image

If anyone can look over the code I have, and recommend some suggestions, that'd be great. If all of you think this is a good thing to have in upstream JupyterHub, I can open a separate issue and work on it there. This is just a proof of concept, with some stuff hardcoded (like the path of the PVC), but I'll implement it properly and make a pull request if you'd like.

rkevin-arch commented 4 years ago

Hi all,

It's been a while since we last heard from all of you. How are things going? Hopefully you all are safe in your homes.

Has anyone taken a look at https://github.com/rkevin-arch/kubespawner_service_jupyterhub yet? It's a modified version of JupyterHub to allow spawning managed services as k8s pods and not just subprocesses. I'd like to get some feedback on this, and potentially create a separate pull request to JupyterHub itself to allow pluggable services. I'm not sure what the best way to implement it is (currently I added the ability for the inbuilt Service class to use kubespawner, but should I refactor it so it can allow any pluggable spawner? What about configuring PVCs for k8s? etc), so if someone can comment on it that'd be great. Thanks!

Our nbgrader + ngshare setup is pretty much 100% working. We can create assignments, release them, download them as another user and submit them, the instructor can collect them and release feedback, and the student can download the feedback. All of this should work with multiple courses, multiple instructors and multiple students. We're just in the process of writing more test cases and squashing bugs.

manics commented 4 years ago

@rkevin-arch Sorry, I missed your previous message asking for feedback. Did you consider running nbgrader as an external Jupyter service? For example create an independent Helm chart for nbgrader, with configuration to connect to the JupyterHub API with a secret token? This is similar to how BinderHub integrates with JupyterHub, and reduces the coupling between the projects which in the longer term should make maintenance easier. I think it might also make it easier to use with other deployments such as The Littlest JupyterHub.

If you've already considered this then apologies, however I'd be interested to see your conclusions as there may be things we could improve in Jupyterhub or Z2JH.

rkevin-arch commented 4 years ago

We've thought about it, and the main downside would be the extra configurations and having to keep track of yet another API token. The main reason why we went with this method is because I don't think we're the first people to want JupyterHub managed services to run on k8s, and we probably won't be the last. The https://github.com/rkevin-arch/kubespawner_service_jupyterhub repo just allows spawning of JupyterHub managed services and isn't aimed specifically towards ngshare/nbgrader.

That said, what you said about reducing coupling is a great concern, since if I want to make a pull request and get that merged to JupyterHub, it will add a kubespawner dependency to those who may not need it. Maybe the best way to do things is to allow service spawning to use user-defined classes, just like notebook spawners and authenticators, but I'm not sure if that's a good idea.

consideRatio commented 4 years ago

Our nbgrader + ngshare setup is pretty much 100% working.

Nice work! :tada:

@rkevin-arch given my current understanding, I'm currently leaning towards thinking that the use case of kubespawner_service_jupyterhub is too specific to be sustainable to maintain.

Perhaps less is more in this case, I'm not sure. But I'd suggest considering the option to only maintain a Dockerfile exposing ngshare, and docs on how to register it as an external service in JupyterHub, but excluding responsibility on how to deploy it. The next level of ambition could be to also maintain instructions on deployment, which could for example be to provide an ngshare-statefulset.yaml if it needs storage and is stateful, or a ngshare-deployment.yaml + ngshare-service.yaml if ngshare is a service that doesn't need persistent storage.

Was ngshare possible to run as a local JupyterHub service also?

rkevin-arch commented 4 years ago

If ngshare is to run as a managed service, I can run ngshare inside the hub pod as a subprocess, but I'd have to modify the helm chart to add a separate PVC to that pod, along with exposing more ports on the hub pod to accommodate that. I feel like that's a more intrusive method than modifying JupyterHub to support spawning managed services as pods.

You have a point on just making it a k8s deployment rather than a simple pod. In that case, ngshare will just not be a managed service, and be deployed as a service manually. I can work on that if you feel like that's a better way to move forward.

consideRatio commented 4 years ago

@rkevin-arch I'll think out loud by writing down the list the options I evaluate in order to ensure what I think make some sense.

  1. Forking the z2jh helm chart to customize anything
  2. Configure the z2jh helm chart, but it can create a custom PVC.
  3. Develop and maintain kubeservicespawner, rebuild the hub image, configure the z2jh helm chart to use it instead of the default image, develop a Dockerfile for the ngshare pod to be created.
  4. Develop a ngshare Dockerfile and potentially also maintain a ngshare-statefulset.yaml file to kubectl apply -f ngshare-statefulset.yaml -n jupyterhub, configure the z2jh helm chart with an external service, and kubectl patch the statefulset with the secret or similar.

    If you want to streamline this further, you could create a helm chart that others can install alongside z2jh, or create a opinionated meta-helm chart that have a requirements.yaml file and a lonely additional resource being the ngshare-statefulset.yaml. I think that would be overkill though.

Yeah, I think it would make the project most sustainable to go for something like 4. I think it could be easier to onboard others to the project, maintaining its current functionality, and developing it further.

rkevin-arch commented 4 years ago

Cool. I'll try to get that working during the weekend. Thanks for the feedback!