Closed willingc closed 4 years ago
@minrk @GeorgianaElena @betatim This generous offer from @moorepants would be great to take Jason up on. Do you think that this is something that you could help get them started on? It would be great for those that use nbgrader. cc/ @jhamrick
Hi @willingc , we (@perllaghu, @lzach and myself) are running a similar setup with nbgrader at The University of Edinburgh. Most of our work has been on making the "Exchange" functionality of nbgrader pluggable, so that assignment can be send of the network instead of a local filesystem. Happy to talk about our approach and ideas on how we could take this forward.
I'd be happy to review ideas on implementation strategies in conjunction with a cloud based setup!
Thinking about it now, an idea pops up inspired by seeing nice results of @GeorgianaElena and others work on jupyterhub/traefik-proxy in conjunction with JupyterHubs nice pluggable architecture. JupyterHub works with different authenticators, spawners, and proxies - all pluggable.
Would it perhaps make sense to make nbgrader pluggable wherever it would normally work directly against a filesystem? Then a REST API, Object storage, etc could potentially be used with close integration with relatively little risk of breaking things unless something quite extensive changes.
Ah anyhow, I'm curious to explore the solution space!
@consideRatio
There's the API we had back in the summer... before nbgrader added feedback
into the loop: https://github.com/jupyter/nbgrader/issues/1130#issuecomment-498600471
There's an MR awaiting approval for the pluggable exchange : https://github.com/jupyter/nbgrader/pull/1238 (its reporting it needs a couple of tweeks)
There's an action with us to move our private code into a public space - however we want to wait until the requirements for the pluggable module base has been finalised before we do that (and we need to unravel some of the context-specific code we have in there)
.... but yes - making the existing file-system exchange work as part of the ecosystem would also be very sensible (and modifying the exchange tests to test the exchange as an API)
Thanks @willingc for making this issue. The students will jump in on the conversation soon. I was traveling this week and will get them looped in now that I'm back.
Hi everyone, sorry for the late response! I'm part of the team of 4 ( @rkevin-arch, @lxylxy123456, @aalmanza1998, @Lawrence37) working on integrating nbgrader into JupyterHub for our senior design project. We're still playing around with the code, getting a JupyterHub+k8s setup in a Vagrant environment, installing nbgrader, etc. Just from what we understand, we have two possible solutions:
Please let us know how feasible these ideas are, and we will start implementing as soon as we get a better idea of the setup and how to proceed. Thanks!
It would be great for people to pull together the scattered knowledge/expertise/snippets around this and contribute to the documentation of Z2JH as well as code to nbgrader. I don't know if there is someone who has an overview of how it could all fit together, what the issues are, who is working on what.
I don't know much about nbgrader deployments (on a kubernetes based hub). The point about the exchange mechanism is the one I have in my head as "first thing you'd have to tackle". My (not terribly informed) knowledge about the current exchange mechanism is that you need a (globally?) writeable directory. Getting this is tricky on a kubernetes deployment and seems like something you'd want to avoid if your course is bigger than ~30 students (aka you don't know all of them personally) as it sounds like a security nightmare :-/ This means it is great to see people working on creating a different exchange mechanism.
From my perspective using something like nbgitpuller as the distribution mechanism is ideal. Or some other "copy stuff over" script installed as a postStart
hook in the user pods. I don't know how this fits with the plans for a new exchange mechanism.
For autograding there are a few home made solutions out there that let students submit their notebook and get a grade back. The key challenges/features here are not executing the notebook with the privileges of the teacher but some "anonymous user" (you never know who accidentally puts !rm -rf /
in their notebook) and not allowing students to exfiltrate the solutions while allowing them to run arbitrary code.
Manual grading is another task with a question mark.
Notebook authoring is probably best done by using the current nbgrader UI.
TL;DR: there are lots of areas that could need work. Especially depending on what security concerns you have. A minimal version would be to use as much of nbgrader-as-is and solve the exchange mechanism problem.
A workaround to getting a shared writeable directory is providing a PVC that can be mounted as "ReadManyWriteMany" to all of the users pods. This is easy to do on Google Cloud by using their Filestore product, I've set this up for clients and it generally seems to "just work". An alternative is to use a NFS provisioner and run your own NFS, I've never tried this but I think @dirkcgrunwald at UC Boulder has (at least he was asking questions about firewall rules and such related to this). @yuvipanda might have also attempted/done this.
I recommend minikube
for you to work on a z2jh setup. kind
has not been so easy to work with as I hoped. The documentation is focused on kind
still though.
@rkevin-arch you wrote in point 2 about requesting storage from k8s etc. Various storage can be mounted for any pod, and on an individual basis. But, what is the requirements of the storage? A key challenge is that storage can often not be mounted simultaneously to many users, and sometimes if that is possible it may require it is ReadOnly for the users. The biggest challenge is to mount storage for many users that all of can read write to.
See this section for more information, this is essential understanding if choosing to use nbgrader as it is: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
Also, if opting for use of NFS, this issue contains a lot of relevant past discussion: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/421
Another challenge working with storage attached to drives etc, is that by default on a z2jh deployment, all users will have the user name of jovyan and user ID of 1000. So, if the same storage is mounted for two different users, they would read/write to it as the same user on a filesystem level.
I would not recommend choosing to utilize storage filesystem IDs determine access etc, I think it would end up being too complicated to develop and maintain together with z2jh.
To attach various kinds of storage based on information about the user is very plausible. Here is a sequence of events that collects information about the user and makes adjustments based on the user.
A lot of things can be extended, for example on the classical jupyter server, the classical jupyter notebook UI, the jupyterlab UI.
Map out how nbgrader's users interacts with a filesystem.
What permissions should various users have to various storage areas? If you overview all interactions and permissions, you can overview what needs to be made available to each user better, and then you can realize the structure of content and permissions within this structure. This is relevant no matter if you will end up using a mounted filesystem, REST API, some interactions with object storage, etc.
@consideRatio @betatim @willingc we got this to work at IllumiDesk albeit it doesn't use the default exchange mechanism to interchange ipynb's. We would love to share our code to demonstrate how we got this to work with JupyterHub (we are hosted on AWS so it's somewhat opinionated).
Before we share anything we would just like to confirm where to share this information to avoid unnecessary chatter.
We are definitely standing on the shoulders of giants ;-)
I would be happy to document what we did for an NFS setup on GKE. There were some non-obvious design choices I wish I had known about.
We are using NFS to share collected nbgrader files but not to collect the students work. We don't have a coherent way of doing that other than our LMS or Github Classroom.
We use another tool for grading non-notebook assignments ( https://inginious.readthedocs.io/en/latest/ ) which works well but doesn't have a command-line method to submit that then also updates grades in the LMS.
Before we share anything we would just like to confirm where to share this information to avoid unnecessary chatter.
I think a good place to share is the forum as a dedicated post. It provides pretty good editing (links, images, tables, bold/italic, sections, etc), receives a fair bit of traffic and doesn't require review before it is public. Then I'd link to those posts from https://zero-to-jupyterhub.readthedocs.io/en/latest/community/index.html to make them easy to discover from the Z2JH guide.
There has been some hesitation with adding new material directly to the Z2JH guide. Partly to not increase its size too much, partly because sections require constant maintenance which requires expertise in that area and partly because doing so involves reviewing work which tends to be a bottleneck. So to get started and iterate towards something that multiple parties agree is "a good way of doing this" I'd create a forum post (you can even make it "wiki editable" so others can directly edit the first post in a topic).
Hi all,
tl;dr: We've just been busy trying to learn k8s. In terms of solutions, we're probably going to go with the first option (hubshare-like API for notebook distribution/collection). We have a working testing setup where we have a private fork of nbgrader and we can test it in z2jh, but it's kind of glitchy.
Reply to betatim: I agree that the notebook distribution / collection is the main issue here. I did a bunch more research and it looks like the plan 2 I had (use PVCs to mount part of the student volumes) might not be the way to go. More details in the reply below.
Also, our team will try getting nbgrader to function first before worrying about running student code in containers / pods to prevent malicious student code from destroying the instructor's home folder or stealing other students' solutions. I've given it some thought, but the easy way out (setuid to another user, or chroot) requires the instructor to be root, which is a pretty stupid idea. We'll look into this after nbgrader functions at all.
Reply to consideRatio: Thanks for recommending minikube
! This is so much better than our original vagrant setup. Also thanks for the link! I originally thought about using volumes that has one writer and many readers if synchronization is an issue, so the instructor can create a volume to distribute the notebook and mount them as read-only on students' containers, and each student would create a submission volume that the instructor would mount as read-only, but looking at the link you sent it seems like not all volumes support access modes other than ReadWriteOnce, so it'd probably be better to develop a setup that works for everyone.
I'll look into the hooks. I still don't fully understand the entire system other than a high level overview, but if we do implement a hubshare-like API, then it's probably a good idea to use hooks upon login to query whether a new assignment is available.
Also yes, we are reading the nbgrader source code to try and determine what filesystem accesses are occuring, and maybe abstract them into one interface. I'll post updates here along the way.
Updates on our testing setup, We're currently making a testing setup that installs nbgrader inside the z2jh environment. We're making a custom Docker image that installs nbgrader from our testing repo, and specifying it under singleuser.image.name
in config.yaml
. It works, but with many weird issues. The helm install
command occasionally hangs, then gives a cryptic error saying Error: transport is closing
. We've moved over to Helm 3, which still occasionally has this problem. The pod that's pulling down the containers seems stuck on downloading jupyterhub/k8s-singleuser-sample:0.8.2
occasionally, where pulling down layer 484c6d5fc38a just hangs forever. It only happens occasionally, and our best bet is to minikube delete
and try again.
In addition, we are occasionally getting weird 500 errors like this one if we make updates and do a helm upgrade
but without fully tearing down the minikube setup and starting fresh:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tornado/gen.py", line 589, in error_callback
future.result()
File "/usr/local/lib/python3.6/dist-packages/jupyterhub/handlers/base.py", line 636, in finish_user_spawn
await spawn_future
File "/usr/local/lib/python3.6/dist-packages/jupyterhub/user.py", line 489, in spawn
raise e
File "/usr/local/lib/python3.6/dist-packages/jupyterhub/user.py", line 409, in spawn
url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 1636, in _start
events = self.events
File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 1491, in events
for event in self.event_reflector.events:
File "/usr/local/lib/python3.6/dist-packages/kubespawner/spawner.py", line 72, in events
key=lambda x: x.last_timestamp,
TypeError: '<' not supported between instances of 'datetime.datetime' and 'NoneType'
It's probably worth submitting an actual issue / bug report, but I'll do that once I understand why this is happening and can reproduce it reliably. We're still looking into the issue. For now, we're mostly just reading the nbgrader codebase and trying to understand what goes where.
Thanks for all of your patience!
@rkevin-arch I suggest that you build on 0.9.0-beta.3 instead of 0.8.2, it is a lot more mature and reliable to use with the latest JupyterHub 1.1.0 etc. I don't think 0.8.2 is fully ready for helm 3, but I'm quite confident 0.9.0-beta.3 should be fine with it. Also, 0.9.0-beta.3 is tested on newer k8s versions that you may be using, while 0.8.2 isnt.
Regarding the issue you have printed, I can confidently say it is due to a modern version of k8s that kubespawner is incompatible with, and that it is fixed in 0.9.0-beta.3 where we use a more modern version of kubespawner! No need to report that.
I don't know about the transport is closing issue.
Regarding learning k8s etc, I've spent a lot of time learning myself. These are videos I recommend my colleagues to watch:
About K8s (increasing complexity):
About helm:
I also suggest the use of the official k8s documentation, in the Concepts section, it is excellent! https://kubernetes.io/docs/concepts/
And, if you have an, this flowchart can be really useful: https://learnk8s.io/troubleshooting-deployments
Awesome, thanks! We'll use 0.9.0-beta.3 instead.
For the transport is closing issue, I believe helm is just timing out because the pod that should pull down the docker image is hanging forever because of a weird download issue. I tried pulling that image (jupyterhub/k8s-singleuser-sample:0.8.2
) on my host system and it hanged weirdly as well, with the 484c6d5fc38a1
layer downloading forever (it downloads around 3MB and hangs). For some reason I can't reproduce it now. Oh well.
Also, thanks for all the youtube links! I'll definitely be watching them this weekend.
We have decided to go down the hubshare route. We're looking through the nbgrader source code to locate filesystem interactions. The plan is to abstract them and implement classes for the existing exchange mechanism and for a hubshare (or similar) service.
I see part of this task has been worked on in jupyter/nbgrader#1238. Thanks @perllaghu for the link! I would very much like to build on the pluggable exchange and hubshare because it's likely more sustainable and universal than creating our own solution.
We are in the early stages of development, so there's room for flexibility. If anyone has comments, tips, or concerns regarding our approach, please share them with us!
One tip: open pull requests for atomic changes (small, self contained). Open these often and early (even in the "work in progress" stage), so that you can get feedback and the CI tests will run.
@Lawrence37 - here's some documentation I put together to show how the assignment_list
and formgrader
components call the exchange (the formatting is shonky, sorry) :
Assignments
are created
, generated
, released
, fetched
, submitted
, collected
, graded
. Then feedback
can be generated
, released
, and fetched
.
The exchange is responsible for recieving released assignments, allowing those assignments to be fetched, accepting submissions, and allowing those submissions to be collected. It also allows feedback to be transferred.
In doing this, the exchange is the authoritative place to get a list of what's what.
CourseDirectory
defines the following directories (and their defaults):
source_directory
- Where new assignments that are created by instructors are put (source
)release_directory
- Where assignments that have been processed for release are copied to (release
)submitted_directory
- Where student submissions are copied to, when an instructor collects (submitted
)autograded_directory
- Where student submissions are copied to, having been autograded (autograded
)feedback_directory
- Where feedback is copied to, when Instructors generate feedback (feedback
)Also, taken from the nbgrader help::
The nbgrader application is a system for assigning and grading notebooks.
Each subcommand of this program corresponds to a different step in the
grading process. In order to facilitate the grading pipeline, nbgrader
places some constraints on how the assignments must be structured. By
default, the directory structure for the assignments must look like this:
{nbgrader_step}/{student_id}/{assignment_id}/{notebook_id}.ipynb
where 'nbgrader_step' is the step in the nbgrader pipeline, 'student_id'
is the ID of the student, 'assignment_id' is the name of the assignment,
and 'notebook_id' is the name of the notebook (excluding the extension).
Exchange functions are called three ways:
nbgrader release_assignment assignment1
.nbgrader/apps/{foo}app.py
.The nbgrader exchange uses the followng classes::
Exchange
ExchangeError
ExchangeCollect
ExchangeFetch
ExchangeFetchAssignment
ExchangeFetchFeedback
ExchangeList
ExchangeRelease
ExchangeReleaseAssignment
ExchangeReleaseFeedback
ExchangeSubmit
Exchange
Base class. Contains some required configuration parameters and elements - the prominant ones include path_includes_course
and coursedir
.
This class defines the following methods which are expeceted to be subclassed:
init_src()
Define the location files are copied from
init_dest()
Define the location files are copied to
copy_files()
Actually copy the files.
The class also defines a convenience method, which may be subclassed::
def start(self):
if sys.platform == 'win32':
self.fail(``Sorry, the exchange is not available on Windows.``)
if not self.coursedir.groupshared:
# This just makes sure that directory is o+rwx. In group shared
# case, it is up to admins to ensure that instructors can write
# there.
self.ensure_root()
self.set_timestamp()
self.init_src()
self.init_dest()
self.copy_files()
You may want to subclass this, as self.root
as a directory only makes sense in a file-based exchange.
ExchangeError
Does nothing in the default exchange, but available for use
ExchangeCollect
Fetches [all] submissions for a specified assignment from the exchange and puts them in the [instructors] home space.
The exchange is called thus::
self.coursedir.assignment_id = assignment_id
collect = ExchangeCollect(
coursedir=self.coursedir,
authenticator=self.authenticator,
parent=self)
try:
collect.start()
except ExchangeError:
self.fail("nbgrader collect failed")
returns.... nothing
{self.coursedir.submitted_directory}/{student_id}/{self.coursedir.assignment_id}
collect.update
is a flag to indicate whether collected files sould be replaced is a later submission is available. There is an assumption this defaults to True
ExchangeFetch
(Depreciated, use ExchangeFetchAssignment
)
ExchangeFetchAssignment
Gets the named assignment & puts the files in the users home space.
The nbgrader server_extension calls it thus::
with self.get_assignment_dir_config() as config:
try:
config = self.load_config()
config.CourseDirectory.course_id = course_id
config.CourseDirectory.assignment_id = assignment_id
coursedir = CourseDirectory(config=config)
authenticator = Authenticator(config=config)
fetch = ExchangeFetchAssignment(
coursedir=coursedir,
authenticator=authenticator,
config=config)
fetch.start()
.....
Returns.... nothing
The expected destination for files is {self.assignment_dir}/{self.coursedir.assignment_id}
however if self.path_includes_course
is True
, then the location should be {self.assignment_dir}/{self.coursedir.course_id}/{self.coursedir.assignment_id}
self.coursedir.ignore
is described as a::
List of file names or file globs.
Upon copying directories recursively, matching files and
directories will be ignored with a debug message.
This should be honoured.
In the default exchange, existing files are not replaced.
ExchangeFetchFeedback
This copies feedback from the exchange into the students home space.
The nbgrader server_extension calls it thus::
with self.get_assignment_dir_config() as config:
try:
config = self.load_config()
config.CourseDirectory.course_id = course_id
config.CourseDirectory.assignment_id = assignment_id
coursedir = CourseDirectory(config=config)
authenticator = Authenticator(config=config)
fetch = ExchangeFetchFeedback(
coursedir=coursedir,
authenticator=authenticator,
config=config)
fetch.start()
.....
returns.... nothing
feedback
directory in whichever directory ExchangeFetchAssignment
deposited files.feedback/{timestamp}
directory, where timestamp
is the timestamp from the timestamp.txt
file generated during the submission.ExchangeList
This class is responsible for determining what assignments are available to the user.
It has three flags to define various modes of operation:
self.remove=True
If this flag is set, the assignment files (as defined below) are removed from the exchange.
self.inbound=True
or self.cached=True
These both refer to submitted assignments. The assignment_list
plugin sets config.ExchangeList.cached = True
when it queries for submitted notebooks.
neither This is released (and thus fetched) assignments.
Note that CourseDirectory
and Authenticator
are defined when the server_sextension assignment_list calls the lister::
with self.get_assignment_dir_config() as config:
try:
if course_id:
config.CourseDirectory.course_id = course_id
coursedir = CourseDirectory(config=config)
authenticator = Authenticator(config=config)
lister = ExchangeList(
coursedir=coursedir,
authenticator=authenticator,
config=config)
assignments = lister.start()
....
returns a List of Dicts - eg::
[
{'course_id': 'course_2', 'assignment_id': 'car c2', 'status': 'released', 'path': '/tmp/exchange/course_2/outbound/car c2', 'notebooks': [{'notebook_id': 'Assignment', 'path': '/tmp/exchange/course_2/outbound/car c2/Assignment.ipynb'}]},
{'course_id': 'course_2', 'assignment_id': 'tree c2', 'status': 'released', 'path': '/tmp/exchange/course_2/outbound/tree c2', 'notebooks': [{'notebook_id': 'Assignment', 'path': '/tmp/exchange/course_2/outbound/tree c2/Assignment.ipynb'}]}
]
The format and structure of this data is discussed in ExchangeList Date Return structure
_ below.
This gets called TWICE by the assignment_list
server_extension - once for released assignments, and again for submitted assignments.
ExchangeRelease
(Depreciated, use ExchangeReleaseAssignment
)
ExchangeReleaseAssignment
This should copy the assignment from the release location (normally {self.coursedir.release_directory}/{self.coursedir.assignment_id}
) and copies it into the exchange service.
The class should check for the assignment existing (look in {self.coursedir.release_directory}/{self.coursedir.assignment_id}
) before actually copying
The exchange is called thus::
release = ExchangeReleaseAssignment(
coursedir=self.coursedir,
authenticator=self.authenticator,
parent=self)
try:
release.start()
except ExchangeError:
self.fail(``nbgrader release_assignment failed``)
returns.... nothing
ExchangeReleaseFeedback
This should copy all the feedback for the current assignment to the exchange.
Feedback is generated by the Instructor. From GenerateFeedbackApp
::
Create HTML feedback for students after all the grading is finished.
This takes a single parameter, which is the assignment ID, and then (by
default) looks at the following directory structure:
autograded/*/{assignment_id}/*.ipynb
from which it generates feedback the the corresponding directories
according to:
feedback/{student_id}/{assignment_id}/{notebook_id}.html
The exchange is called thus::
release_feedback = ExchangeReleaseFeedback(
coursedir=self.coursedir,
authenticator=self.authenticator,
parent=self)
try:
release_feedback.start()
except ExchangeError:
self.fail("nbgrader release_feedback failed")
returns..... nothing
ExchangeSubmit
This should copy the assignment from the user's work space, and make it available for instructors to collect.
The exchange is called thus::
with self.get_assignment_dir_config() as config:
try:
config = self.load_config()
config.CourseDirectory.course_id = course_id
config.CourseDirectory.assignment_id = assignment_id
coursedir = CourseDirectory(config=config)
authenticator = Authenticator(config=config)
submit = ExchangeSubmit(
coursedir=coursedir,
authenticator=authenticator,
config=config)
submit.start()
.....
The source for files to be submitted needs to match that in ExchangeFetchAssignment
.
returns.... nothing
timestamp.txt
to be in the submission, containing the timestamp of that submission. The creation of this file is the responsibility of this class.student_id
, as well as course_id
& assignment_id
inbound
and cache
store. This may be significant considering ExchangeList
As mentioned in the ExchangeList
_ class documentation above, this data is returned as a List of Dicts.
The format of the Dicts vary depending on the type of assignments being listed.
Returns a list of assignments formatted as below (whether they are released
or submitted
), but with the status set to removed
path
) and get some basic data:released
{course_id: xxxx, assignment_id: yyyy}
submitted
{course_id: xxxx, assignment_id: yyyy, student_id: aaaa, timestamp: ISO 8601}
status
and path
information: if self.inbound or self.cached:
info['status'] = 'submitted'
info['path'] = path # ie, where it is in the exchange
elif os.path.exists(assignment_dir):
info['status'] = 'fetched'
info['path'] = os.path.abspath(assignment_dir) # ie, where it in on the students home space.
else:
info['status'] = 'released'
info['path'] = path # again, where it is in the exchange
if self.remove:
info['status'] = 'removed'
# Note, no path - it's been deleted.
(assignment_dir
is the directory in the students home space, so needs to take into account self.path_includes_course
)
path
, and get some basic data:: nb_info = {'notebook_id': /name, less extension/, 'path': /path_to_file/}
info['status'] != 'submitted'
:
that's all the data we have:: info['notebooks'].append(nb_info)
else, add *feedback* details for *this* notebook::
nb_info['has_local_feedback'] = _has_local_feedback()
nb_info['has_exchange_feedback'] = _has_exchange_feedback()
if nb_info['has_local_feedback']:
nb_info['local_feedback_path'] = _local_feedback_path()
if nb_info['has_local_feedback'] and nb_info['has_exchange_feedback']:
nb_info['feedback_updated'] = _exchange_feedback_checksum() !=
_local_feedback_checksum()
info['notebooks'].append(nb_info)
Having looped through all notebooks
If info['status'] == 'submitted'
, add feedback notes to the top-level assignment record::
info['has_local_feedback'] = _any_local_feedback()
info['has_exchange_feedback'] = _any_exchange_feedback()
info['feedback_updated'] = _any_feedback_updated()
if info['has_local_feedback']:
info['local_feedback_path'] = os.path.join(
assignment_dir, 'feedback', info['timestamp'])
else:
info['local_feedback_path'] = None
Hey everyone, a quick update: Our team has pretty much split into two parts. Some of our team members is working on an exchange based on https://github.com/jupyter/nbgrader/pull/1238 to talk to a JupyterHub service, and others (like me) are working on writing a JupyterHub service that implements the API described in https://github.com/jupyter/nbgrader/issues/659. We're using Tornado and SQLAlchemy for the API, since that's used by the Jupyter community in many places already. We have some very basic functionality finished, but we are having trouble testing in a Kubernetes cluster.
The problem we're having is about running a JupyterHub service with a public URL in k8s. I've been digging through the Helm chart and see how the culler is implemented as a service, and I tried to replicate that, but I'm not sure what to specify as the URL. We tried the standard http://127.0.0.1:10101
which works with regular JupyterHub, but on Z2JH it gives a 503 error. Looking at the logs I think the issue is that the proxy and the hub are two different pods. Running the jupyterhub service will run it in the hub, while the proxy still thinks the right IP to connect to is 127.0.0.1
which is not the hub. I also tried using something like 'http://%s:10101'%c.JupyterHub.hub_connect_ip
, but tornado refuses to listen on that IP (Cannot assign requested address
). I can change it to listen on 0.0.0.0, but requests to it just hangs because it can't access JupyterHub's API (I confirmed this since the @authenticated
decorator is what's making it hang, and printing out the JUPYTERHUB_API_URL
environment variable gives me something like http://10.96.192.15:8081/hub/api
, and curl
ing that URL gives me a timeout inside the hub container). In the meantime, visiting the API from the website through the proxy just gives me 404s that I can't trace back to the original issue. If anyone has experience running JupyterHub services with public URLs in Z2JH, or have any idea how to make this work, input is greatly appreciated.
On an unrelated note, does anyone have suggestions on how to deal with persistent storage? For now, we're just putting them in the working directory. For the z2jh case, we're just putting it in the same place where jupyterhub.sqlite
is put (the hub's persistent volume). Would this be a good place to put it? Is it better practice to request another piece of storage for the nbgrader stuff so it doesn't mix with the hub's data? Also, is it even a good idea to run the JupyterHub service inside the hub pod, or should we start up another pod instead? (In that case it won't be a hub managed service, it will run standalone from the hub) Thanks!
Hi Kevin, I'm the one who wrote jupyter/nbgrader#1238. We're still testing the new interface and I might push some minor changes as we find issues on our side, so you might want to check the pull request for changes every now and then. If you run into any problems, feel free to send me a message.
@rkevin-arch Sorry if I've missed it in this thread. Do you have your z2jh config and custom images (is it a Z2JH fork?) in a public repo that we can look at? We can probably make some general suggestions, but if we can see it we can hopefully give you more targetted advice.
Sorry, it's not in a public repo yet since we don't have the main functionality finished yet. For the Z2JH service setup, I can't even get the simple whoami service (https://github.com/jupyterhub/jupyterhub/tree/master/examples/service-whoami, which is tested to work on regular jupyterhub) to work on Z2JH. Here's my config:
# https://zero-to-jupyterhub.readthedocs.io/en/latest/reference/reference.html
proxy:
secretToken: "<REDACTED>"
https:
enabled: false
singleuser:
memory:
limit: 512M
guarantee: 128M
auth:
admin:
users:
- rkevin
hub:
image:
name: k8s-hub-testing
tag: '0.0.1'
#services:
# whoami:
# url: 'http://127.0.0.1:10101'
# command: python3 /etc/jupyterhub/whoami.py
extraConfig:
test.py: |
c.JupyterHub.services.append({
'name': 'whoami',
#'url': 'http://127.0.0.1:10101',
'url': 'http://%s:10101'%c.JupyterHub.hub_connect_ip,
'command': ['python3', '/etc/jupyterhub/whoami.py']
})
Here's a Dockerfile that will get built into k8s-hub-testing:0.0.1
:
FROM jupyterhub/k8s-hub:0.9.0-beta.3.n027.hb7da682
COPY whoami.py /etc/jupyterhub/
The whoami.py
has not been changed.
Our testing script:
eval $(minikube docker-env)
cd hub-testing-image
docker build -t k8s-hub-testing:0.0.1 .
cd ..
eval $(minikube docker-env -u)
echo "This will take a while with no output, please be patient..."
helm install jhub jupyterhub/jupyterhub -f config.yaml --version 0.9.0-beta.3
minikube service list
@rkevin-arch I just wanted to comment on your statement "Sorry, it's not in a public repo yet since we don't have the main functionality finished yet." I would like you all to work in public repos starting now, so that you can effectively get help. It's best to get into the same practices and workflows as ngbrader, jupyterhub, etc.
To iterate faster, you can actually do this without building your docker image again after changing the whoami.py file, by mounting it in k8s.
Im one mobile, aim to give an example later on this.
Re: consideRatio, is there a way to restart the service managed by the hub without doing a helm upgrade or helm uninstall / install? If not, mounting the file inside k8s won't save much time, since the docker build
finishes within a second (it's just copying one directory). If so, how can we do that?
Re: moorepants, our repo is a big mess, with lots of testing setups and throwaway scripts littered around. We'll either clean up a bit and make it public, or start a new public repo and start developing there.
It is ok if it is a mess. The only thing you don't want public in this case are any private credentials. I recommend committing what you have (sans credentials) and working openly and collaboratively on github.
@rkevin-arch ah, i understand, then let's have it remain in this way - the option is more complicated!
I understand, once you update the docker image, you may want to update jupyterhub! You could do this:
Values to be passed to helm
when running helm upgrade
, for example from a file that we often refer to as config.yaml
.
hub:
image:
# always re-pull the image even if it has the same
# tag as already exists since before on the node
pullPolicy: Always
A terminal command to delete the jupyterhub pod running the container. Since this pod was created by a k8s ReplicaSet, which in turn was created by a Deployment, it will be automatically recreated if it is deleted.
# ps: the --selector has a short form of -l, and component=hub is a pod label and label value
kubectl delete pod --namespace my-jhub-namespace --selector component=hub
Hi, we have opened source the service side of our project at ngshare repo. Please fell free to take a look and give some feedbacks.
The database specification is almost complete using SQL Alchemy.
There are two versions of servers: vserver
is a simple server using flask and ngshare
which will be a JupyterHub service. Currently most APIs are to be implemented.
We have basically completed the development for the API logic and a standalone backend server (vserver
). Would you like to try it following https://github.com/lxylxy123456/ngshare#installation-and-setup and provide any comments and suggestions?
We also wrote the plan for the entire project structure in the same markdown file.
Hi,
I have been trying to make a JupyterHub managed service exposed to the proxy pod for a Z2JH setup. There are some very weird pod has unbound immediate PersistentVolumeClaim
errors. I believe the hub pod keeps crashing (that's the only explanation for the occasional weird 404 errors, "Service Unavailable" messages, and the "MinimumReplicasAvailable" transition time to always be less than a minute on the minikube dashboard). The service does work when the hub pod happens to be online, and the persistent storage also seems to work if I examine things using kubectl exec -it hub-* bash
.
I have edited the helm chart as follows: https://github.com/rkevin-arch/zero-to-jupyterhub-k8s/commit/e34a61038867fa8cf67096ffd13721b20bd32cd0 and our current testing setup is here: https://github.com/lxylxy123456/ngshare/tree/master/testing/minikube. Does anyone know what possible reason could have caused this issue? kubectl logs hub-*
only show regular web traffic, except for the occasional Cannot connect to managed service ngshare at http://10.97.94.52:10101
. kubectl describe pod hub*
gives me:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 21m default-scheduler error while running "VolumeBinding" filter plugin for pod "hub-747587d5b6-ln86t": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 21m default-scheduler Successfully assigned default/hub-747587d5b6-ln86t to minikube
Warning FailedScheduling 21m default-scheduler AssumePod failed: pod b7aee470-6a45-470b-bee9-90763716abba is in the cache, so can't be assumed
Warning Unhealthy 21m kubelet, minikube Readiness probe failed: Get http://172.17.0.4:8081/hub/health: dial tcp 172.17.0.4:8081: connect: connection refused
Warning BackOff 21m kubelet, minikube Back-off restarting failed container
Normal Pulled 21m (x3 over 21m) kubelet, minikube Container image "hub-testing:0.0.1" already present on machine
Normal Created 21m (x3 over 21m) kubelet, minikube Created container hub
Normal Started 21m (x3 over 21m) kubelet, minikube Started container hub
Warning Unhealthy 93s (x56 over 17m) kubelet, minikube Readiness probe failed: Get http://172.17.0.4:8081/hub/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Surprisingly, kubectl get pvc
thinks everything's normal, even during the brief period when the hub is down:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
claim-pip Bound pvc-9cb6827f-b042-44e4-938f-c810ded02b9b 10Gi RWO standard 23m
hub-db-dir Bound pvc-34d155f8-75c8-4aa4-8216-b18627a4d9b1 1Gi RWO standard 23m
hub-ngshare-dir Bound pvc-4901ef29-1b07-4211-abad-5144e1ec86e7 1Gi RWO standard 23m
I'm not sure what's going on. Any ideas?
Service unavailable is happening when accessing the k8s Service representing the hub pods (only one though). If the hub pod isnt considered ready, the k8s service will return "Service unavaialble".
Hmmm, running on minikube, what is your config.yaml passed to helm upgrade
?
If some logic made the hub pod unresponsive for a while, it would fail to respond to /hub/health, and would show Service unavailable btw.
Can a JH service running on the hub pod that get stuck in a loop (or awaiting a network call and times out after 1 min) cause the hub to get inresponsive?
Is the log message Cannot connect to managed service ngshare at http://10.97.94.52:10101
from the hub process?
I would investigate in the direction that Erik suggested: is your hub running properly or is it getting stuck somewhere. For example what does JupyterHub do when it can't start a managed service, does a managed service run as a separate process, things like that. You are looking for a reason that the hub stops responding "I am happy" on the health endpoint (which is why kubernetes ends up telling you the service is unavailable).
ps. this thread is getting very long and starting to cover various topics. What do people think of moving the discussion to the forum and starting separate threads for each topic? For example debugging why adding a service makes the hub not-ready is a general thing that others in the forum might have experience with or be interested in readying about.
I started watching https://github.com/lxylxy123456/ngshare
I think splitting into topics is a good idea. There are several other issues that could come out of this, for instance can something be done in the Z2JH chart to make it easier to integrate applications such as this without modifying the chart.
Can a JH service running on the hub pod that get stuck in a loop (or awaiting a network call and times out after 1 min) cause the hub to get inresponsive?
Yes it can! See:
Re: betatim, that log message is coming from the proxy. Inside a regular singleuser image we can still talk to the hub and the service, which is why I thought the hub is still working and the proxy is just not seeing it for some reason.
Re: manics, I'm actually surprised that's the case, since I thought JH starts the service as a completely separate process and will also automatically restart it if it dies. I'll definitely look more into that.
I think I have finally figured out the issue after pulling my hair out, and it's kinda complicated. The problem is starting the service as a JupyterHub managed service means that JupyterHub itself will also occasionally poll the service to make sure it's up (https://github.com/jupyterhub/jupyterhub/blob/0427f8090fed143d7bf2b75cf6c35a3acea19557/jupyterhub/app.py#L1932). However, requests to the service is routed through the proxy.
If I make the URL for the managed service http://127.0.0.1:10101
, then the hub will see the service, but the proxy will proxy the request to localhost, which is the proxy, not the hub (they're two different pods). Therefore, requests to it will fail.
If I make the URL for the managed service 'http://%s:10101'%os.environ['HUB_SERVICE_HOST']
, then the proxy can route requests correctly, and the service will work for a little bit. However, for some reason, the hub cannot access its own IP address inside the pod itself. Connecting to http://10.111.133.52:10101
in the hub will result in a timeout despite the pod's address being 10.111.133.52
, and all singleuser pods and the proxy pod can see it perfectly. The hub thinks the service is dead because it's not responding, and despite using tornado, it is actually hanging, just like https://github.com/jupyterhub/jupyterhub/issues/2928#issuecomment-591987199 (thanks for the link BTW! Would've never figured it out without it).
This log snippet demonstrates this is actually the case:
[I 2020-03-01 08:52:57.141 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 1.43ms
[I 2020-03-01 08:53:07.144 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 1.82ms
[W 2020-03-01 08:53:38.732 JupyterHub app:1903] Cannot connect to managed service ngshare at http://10.108.93.19:10101
[I 2020-03-01 08:53:38.736 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 1.48ms
[I 2020-03-01 08:53:38.737 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 2.51ms
[I 2020-03-01 08:53:38.738 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 2.75ms
[I 2020-03-01 08:53:38.742 JupyterHub proxy:320] Checking routes
[I 2020-03-01 08:53:52.203 JupyterHub log:174] 200 GET /hub/health (@172.17.0.1) 0.47ms
Between 08:53:08 and 08:53:38, it's likely hanging at the check, and not responding to requests to the health endpoint. I'll play around with either removing that check or finding a workaround and ask it to check localhost rather than the URL intended for the proxy.
Thanks so much for the links and replies!
@rkevin-arch Could you create a Kubernetes Service instead of the URLs you've crafted?
Here's where we are.... I still need to tidy a few things (like make the tests pass again) & update docs a bit more.... but we're getting there https://github.com/edina/nbexchange
Hi all, hope you're safe during this pandemic. Sorry for the silence for quite a while, we're busy with finals, but rest assured we're making some decent progress.
Re: ryanlovett, the main reason I didn't originally do this is because JupyterHub only supports spawning managed services as a subprocess, not a Kubernetes pod. If you use an external service then you have to manage API tokens and URLs and those kinds of stuff.
However, that's what I've been working on in the past few weeks. I've modified JupyterHub to spawn managed services as k8s pods. You can see the repo here, and here's a screenshot of it working (you can see the jupyter-service-ngshare
pod spawned by the hub on startup, and the service being accessible through the proxy.
If anyone can look over the code I have, and recommend some suggestions, that'd be great. If all of you think this is a good thing to have in upstream JupyterHub, I can open a separate issue and work on it there. This is just a proof of concept, with some stuff hardcoded (like the path of the PVC), but I'll implement it properly and make a pull request if you'd like.
Hi all,
It's been a while since we last heard from all of you. How are things going? Hopefully you all are safe in your homes.
Has anyone taken a look at https://github.com/rkevin-arch/kubespawner_service_jupyterhub yet? It's a modified version of JupyterHub to allow spawning managed services as k8s pods and not just subprocesses. I'd like to get some feedback on this, and potentially create a separate pull request to JupyterHub itself to allow pluggable services. I'm not sure what the best way to implement it is (currently I added the ability for the inbuilt Service
class to use kubespawner, but should I refactor it so it can allow any pluggable spawner? What about configuring PVCs for k8s? etc), so if someone can comment on it that'd be great. Thanks!
Our nbgrader + ngshare setup is pretty much 100% working. We can create assignments, release them, download them as another user and submit them, the instructor can collect them and release feedback, and the student can download the feedback. All of this should work with multiple courses, multiple instructors and multiple students. We're just in the process of writing more test cases and squashing bugs.
@rkevin-arch Sorry, I missed your previous message asking for feedback. Did you consider running nbgrader as an external Jupyter service? For example create an independent Helm chart for nbgrader, with configuration to connect to the JupyterHub API with a secret token? This is similar to how BinderHub integrates with JupyterHub, and reduces the coupling between the projects which in the longer term should make maintenance easier. I think it might also make it easier to use with other deployments such as The Littlest JupyterHub.
If you've already considered this then apologies, however I'd be interested to see your conclusions as there may be things we could improve in Jupyterhub or Z2JH.
We've thought about it, and the main downside would be the extra configurations and having to keep track of yet another API token. The main reason why we went with this method is because I don't think we're the first people to want JupyterHub managed services to run on k8s, and we probably won't be the last. The https://github.com/rkevin-arch/kubespawner_service_jupyterhub repo just allows spawning of JupyterHub managed services and isn't aimed specifically towards ngshare/nbgrader.
That said, what you said about reducing coupling is a great concern, since if I want to make a pull request and get that merged to JupyterHub, it will add a kubespawner dependency to those who may not need it. Maybe the best way to do things is to allow service spawning to use user-defined classes, just like notebook spawners and authenticators, but I'm not sure if that's a good idea.
Our nbgrader + ngshare setup is pretty much 100% working.
Nice work! :tada:
@rkevin-arch given my current understanding, I'm currently leaning towards thinking that the use case of kubespawner_service_jupyterhub is too specific to be sustainable to maintain.
Perhaps less is more in this case, I'm not sure. But I'd suggest considering the option to only maintain a Dockerfile exposing ngshare, and docs on how to register it as an external service in JupyterHub, but excluding responsibility on how to deploy it. The next level of ambition could be to also maintain instructions on deployment, which could for example be to provide an ngshare-statefulset.yaml if it needs storage and is stateful, or a ngshare-deployment.yaml + ngshare-service.yaml if ngshare is a service that doesn't need persistent storage.
Was ngshare possible to run as a local JupyterHub service also?
If ngshare is to run as a managed service, I can run ngshare inside the hub pod as a subprocess, but I'd have to modify the helm chart to add a separate PVC to that pod, along with exposing more ports on the hub pod to accommodate that. I feel like that's a more intrusive method than modifying JupyterHub to support spawning managed services as pods.
You have a point on just making it a k8s deployment rather than a simple pod. In that case, ngshare will just not be a managed service, and be deployed as a service manually. I can work on that if you feel like that's a better way to move forward.
@rkevin-arch I'll think out loud by writing down the list the options I evaluate in order to ensure what I think make some sense.
Develop a ngshare Dockerfile and potentially also maintain a ngshare-statefulset.yaml file to kubectl apply -f ngshare-statefulset.yaml -n jupyterhub
, configure the z2jh helm chart with an external service, and kubectl patch
the statefulset with the secret or similar.
If you want to streamline this further, you could create a helm chart that others can install alongside z2jh, or create a opinionated meta-helm chart that have a requirements.yaml file and a lonely additional resource being the ngshare-statefulset.yaml. I think that would be overkill though.
Yeah, I think it would make the project most sustainable to go for something like 4. I think it could be easier to onboard others to the project, maintaining its current functionality, and developing it further.
Cool. I'll try to get that working during the weekend. Thanks for the feedback!
Surfacing the following as a new issue.
We have a kubernetes based JupyterHub deployment at UC Davis (on bare metal) and would like to get nbgrader running on the system for instructors to use. I have a team of four CS seniors that will spend about 40 collective hours a week for 5 months with the goal to develop a community usable solution for this issue. They need some help getting oriented to understand what the current state of affairs is, what the issues are, and what solution paths have people considered. Any suggestions on how to move forward? They start this week :)
Originally posted by @moorepants in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/174#issuecomment-574978242