Create a JupyterHub "exchange" service to replace the exchange directory

jhamrick commented 7 years ago

I would eventually like to replace the nbgrader exchange directory with a more robust solution, namely, a JupyterHub "exchange" service that manages released assignments, submissions, and feedback. This has the drawback that people won't be able to use nbgrader's file management capabilities unless they are using JupyterHub. In practice, I don't think anyone is using nbgrader's file management capabilities without JupyterHub anyway, though (but if I am wrong about this, someone should correct me!). Here's how I will imagine this working.

@lgpage @minrk @ellisonbg @willingc @dsblank I would appreciate any feedback you have on this proposal!

Permissions

All authentication will be handled by JupyterHub, which will tell the service who the current user is and what group(s) they are a part of. The exchange service will handle any number of courses on the same machine, and for each course, will require that there are two groups specified: one for instructors (who are allowed to release, fetch, submit, and collect assignments and return feedback) and one for students (who are allowed to fetch and submit assignments and download feedback). This would be configured something like this:

c.ExchangeApp.groups = {
    'course1': dict(instructors='instructors_course1', students='students_course1'),
    'course2': dict(instructors='instructors_course2', students='students_course2'),
    ...
}

API

The exchange service will define a REST api that the nbgrader commands (release, fetch, submit, collect, etc.) can access.

/api/assignments

GET /api/assignments/<course_id> -- list all assignments for a course (students+instructors)

/api/assignment

GET /api/assignment/<course_id>/<assignment_id> -- download a copy of an assignment (students+instructors)
POST /api/assignment/<course_id>/<assignment_id> -- release an assignment (instructors only)

/api/submissions

GET /api/submissions/<course_id>/<assignment_id> -- list all submissions for an assignment from all students (instructors only)
GET /api/submissions/<course_id>/<assignment_id>/<student_id> -- list all submissions for an assignment from a particular student (instructors+students, though students are restricted to only viewing their own submissions)

/api/submission

POST /api/submission/<course_id>/<assignment_id>/<student_id> -- submit a copy of an assignment (students+instructors)
GET /api/submission/<course_id>/<assignment_id>/<student_id> -- download a student's submitted assignment (instructors only)

/api/feedback

POST /api/feedback/<course_id>/<assignment_id>/<student_id> -- upload feedback on a student's assignment (instructors only)
GET /api/feedback/<course_id>/<assignment_id>/<student_id> -- download feedback on a student's assignment (instructors+students, though students are restricted to only viewing their own feedback)

Exchange implementation

Under the hood, the exchange service will continue to store files directly on the filesystem, but they will all have the same permissions (read and write only for the user running the exchange service). I think this is a better option that doing it with a database because we don't really need any fancy relational features here and this also makes it easier for instructors to inspect files in the exchange manually. If someone feels strongly that a database should be used then I might be able to be convinced otherwise, though.

Regardless, I do want to implement some form of checksumming, though, because I have noticed at least in the current implementation that sometimes if the system is under heavy load that the submissions are occasionally incomplete or corrupted (e.g. missing timestamp.txt or something).

Existing nbgrader apps

The existing nbgrader apps will be reworked to make requests to the exchange API rather than copying to and from the exchange directory.

One thing I am not quite sure of is how the command line apps get properly authenticated, because the authentication is normally happening in the browser, not the command line. I see two possible solutions:

One solution to this is to say that these commands can only be used through the server extension, and then have that extension pass the authentication information to the command line apps. This is probably the easiest but then it means you can't just run the commands from the command line anymore.
The other solution is to require some how that users re-authenticate from the command line. I am not really sure how to this in a general way that handles all the forms of authentication that JupyterHub uses. Maybe @minrk can weigh in on the feasibility of this, but from what I know about how this works it doesn't seem like a particularly feasible option to me?

ellisonbg commented 7 years ago

Did you see the hubshare spec we wrote?

Sent from my iPhone

On Jan 15, 2017, at 2:03 PM, Jessica B. Hamrick notifications@github.com wrote:

I would eventually like to replace the nbgrader exchange directory with a more robust solution, namely, a JupyterHub "exchange" service that manages released assignments, submissions, and feedback. This has the drawback that people won't be able to use nbgrader's file management capabilities unless they are using JupyterHub. In practice, I don't think anyone is using nbgrader's file management capabilities without JupyterHub anyway, though (but if I am wrong about this, someone should correct me!). Here's how I will imagine this working.

@lgpage @minrk @ellisonbg @willingc @dsblank I would appreciate any feedback you have on this proposal!

Permissions

All authentication will be handled by JupyterHub, which will tell the service who the current user is and what group(s) they are a part of. The exchange service will handle any number of courses on the same machine, and for each course, will require that there are two groups specified: one for instructors (who are allowed to release, fetch, submit, and collect assignments and return feedback) and one for students (who are allowed to fetch and submit assignments and download feedback). This would be configured something like this:

c.ExchangeApp.groups = { 'course1': dict(instructors='instructors_course1', students='students_course1'), 'course2': dict(instructors='instructors_course2', students='students_course2'), ... } API

The exchange service will define a REST api that the nbgrader commands (release, fetch, submit, collect, etc.) can access.

/api/assignments

GET /api/assignments/ -- list all assignments for a course (students+instructors) /api/assignment

GET /api/assignment// -- download a copy of an assignment (students+instructors) POST /api/assignment// -- release an assignment (instructors only) /api/submissions

GET /api/submissions// -- list all submissions for an assignment from all students (instructors only) GET /api/submissions/// -- list all submissions for an assignment from a particular student (instructors+students, though students are restricted to only viewing their own submissions) /api/submission

POST /api/submission/// -- submit a copy of an assignment (students+instructors) GET /api/submission/// -- download a student's submitted assignment (instructors only) /api/feedback

POST /api/feedback/// -- upload feedback on a student's assignment (instructors only) GET /api/feedback/// -- download feedback on a student's assignment (instructors+students, though students are restricted to only viewing their own feedback) Exchange implementation

Under the hood, the exchange service will continue to store files directly on the filesystem, but they will all have the same permissions (read and write only for the user running the exchange service). I think this is a better option that doing it with a database because we don't really need any fancy relational features here and this also makes it easier for instructors to inspect files in the exchange manually. If someone feels strongly that a database should be used then I might be able to be convinced otherwise, though.

Regardless, I do want to implement some form of checksumming, though, because I have noticed at least in the current implementation that sometimes if the system is under heavy load that the submissions are occasionally incomplete or corrupted (e.g. missing timestamp.txt or something).

Existing nbgrader apps

The existing nbgrader apps will be reworked to make requests to the exchange API rather than copying to and from the exchange directory.

One thing I am not quite sure of is how the command line apps get properly authenticated, because the authentication is normally happening in the browser, not the command line. I see two possible solutions:

One solution to this is to say that these commands can only be used through the server extension, and then have that extension pass the authentication information to the command line apps. This is probably the easiest but then it means you can't just run the commands from the command line anymore. The other solution is to require some how that users re-authenticate from the command line. I am not really sure how to this in a general way that handles all the forms of authentication that JupyterHub uses. Maybe @minrk can weigh in on the feasibility of this, but from what I know about how this works it doesn't seem like a particularly feasible option to me? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jhamrick commented 7 years ago

No, I don't think so...

jhamrick commented 7 years ago

Here is a link to it: https://github.com/jupyterhub/hubshare/blob/master/specification.md

That will definitely be really nice, and will make some of this stuff unnecessary for sure. Do you know what the timeline is for that?

ellisonbg commented 7 years ago

Min and I worked through the design at the last decade meeting. As of right there isn't anyone who is working on it though. But that is one of our grant deliverqbles so it will get done eventually. But if you have time and want to work on it feel free

Sent from my iPhone

On Jan 15, 2017, at 2:28 PM, Jessica B. Hamrick notifications@github.com wrote:

Here is a link to it: https://github.com/jupyterhub/hubshare/blob/master/specification.md

That will definitely be really nice, and will make some of this stuff unnecessary for sure. Do you know what the timeline is for that?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

minrk commented 7 years ago

One thing I am not quite sure of is how the command line apps get properly authenticated

I'd use authentication tokens for this. HubAuth did recently get support for API tokens in the Authorization header, not just cookies. You can store these in a file once generated. I think we do need to have a page for requesting a new token in the Hub UI to complete the loop, though. It would look like:

request an API token (this is fiddly right now, but I'll add a page for it)
save that somewhere like ~/.nbgrader/token
CLI apps look for this file, use it in Authorization header. If not present, point to Hub page where they can get one.

Spawners could request and install this token at launch, to make it easy to do it from the single-user-server terminal.

ellisonbg commented 7 years ago

A page to get a token would be helpful!

On Mon, Feb 6, 2017 at 2:33 AM, Min RK notifications@github.com wrote:

One thing I am not quite sure of is how the command line apps get properly authenticated

I'd use authentication tokens for this. HubAuth did recently get support for API tokens in the Authorization header, not just cookies. You can store these in a file once generated. I think we do need to have a page for requesting a new token in the Hub UI to complete the loop, though. It would look like:

request an API token (this is fiddly right now, but I'll add a page for it)

save that somewhere like ~/.nbgrader/token

CLI apps look for this file, use it in Authorization header. If not present, point to Hub page where they can get one.

Spawners could request and install this token at launch, to make it easy to do it from the single-user-server terminal.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyter/nbgrader/issues/659#issuecomment-277643882, or mute the thread https://github.com/notifications/unsubscribe-auth/AABr0EpcVmc_DX7ScgH8IBHxN21UNAXoks5rZvblgaJpZM4LkDD9 .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

minrk commented 7 years ago

Page for a token: https://github.com/jupyterhub/jupyterhub/pull/971

bbhopesh commented 6 years ago

Is someone already working on implementing this? I'd like to contribute if another person is needed for this task or some part of it.

jhamrick commented 6 years ago

@minrk would be the person to ask to see what the current status of HubShare is. All the development for that will happen at https://github.com/jupyterhub/hubshare so that is a good repo to subscribe to if you're interested in contributing to it!

perllaghu commented 6 years ago

I'm interested in this, however have two complications to throw into the mix:

In our environment, the jupyterhub service spawns notebooks onto different VMs in a swarm. This means that persistent storage is done via NFS (the Notebooks tree view does not use the ContentsManager plugin), so each notebook runs as the same user (and we use Docker labels to distinguish things for accounting)
We connect via LTI - so course ID & role details are not stored in the hub anywhere... so having a configuration dictionary is..... problematic.

(but I'm going to subscribe to hubshare too)

jhamrick commented 6 years ago

@perllaghu The idea with the HubShare service will be to alleviate the issue with your first point. The main idea is that HubShare will have control over some sort of file store for the exchange, with permissions determined based on JupyterHub users rather than local process users. This means you could definitely launch notebooks as all the same user (as long as the JupyterHub users are different) and HubShare would appropriately manage access to the exchange.

Is there a way you can programmatically get the course id and role through LTI? e.g. if you can get the course id through an environment variable, then I think that shouldn't end up being a problem.

perllaghu commented 6 years ago

Absolutely - this is the whole point of LTI: it does the authentication/authorisation part (basically OAuth), and gives you course, user, and role - so you know if the user is an instructor (gets FormGrader) or a student (gets Assignment)

perllaghu commented 5 years ago

Just to let people know - We've started a [currently private] version of this.....

Not using HubShare, as that's too generic, and doesn't have the authentication/authorisation stuff in there.

I hope to be able to persuade people to make it generic enough to work for our [kubenetes behind a proxy server] environment as well as more generic setups.

nthiery commented 4 years ago

Hi @perllaghu

Just to let people know - We've started a [currently private] version of this..... Not using HubShare, as that's too generic, and doesn't have the authentication/authorisation stuff in there.

I hope to be able to persuade people to make it generic enough to work for our [kubenetes behind a proxy server] environment as well as more generic setups.

Has there been progress on this front? We would be interested for multiple courses next spring where it gets annoying to have to have to tweak the JupyterHub configuration for each new course.

Thanks in advance!

perllaghu commented 4 years ago

Yes there is..... I'm in the wrong place to give you the Pull Request numbet for this - but it's hopefully not far.

BertR commented 4 years ago

Hi @nthiery , this is the pull request: https://github.com/jupyter/nbgrader/pull/1238 @lzach has been working on the documentation on how to write an exchange plugin and we're also planning to push our own implementation to a public GitHub repository.

nthiery commented 4 years ago

Thanks for the quick feedback! I am interested in beta testing whenever this is out.

nthiery commented 4 years ago

Hi @BertR,

and we're also planning to push our own implementation to a public GitHub repository.

I am looking forward to it! Has there been progress on this side?

BertR commented 4 years ago

Yes! Today @perllaghu pushed our exchange to https://github.com/edina/nbexchange very rough around the edges, but we will clean it up and add some examples of how it can be used.

nthiery commented 4 years ago

Ah ah! Will check this out today! Thank you.

perllaghu commented 4 years ago

Be delighted with any critique/observations...

nthiery commented 4 years ago

Thanks for offering help :-)

I am reading through the documentation and a bit confused:

On the exchange side, the user plugin service seems to be responsible for specifying the course id.
On the nbgrader side, it seems that that the course id is specified in the request.

https://github.com/edina/nbexchange/blob/5f501ed9b4247b463afb55713a994c88de038682/nbexchange/plugin/submit.py#L53

Or is it meant for both to be possible, depending on use case? E.g. an exchange service dedicated to a given course vs an exchange service for multiple courses?

nthiery commented 4 years ago

Currently, the exchange service does not provide a mean to share nbgrader's grade database among several instructors, right?

perllaghu commented 4 years ago

On the exchange side, the user plugin service seems to be responsible for specifying the course id. - On the nbgrader side, it seems that that the course id is specified in the request. https://github.com/edina/nbexchange/blob/5f501ed9b4247b463afb55713a994c88de038682/nbexchange/plugin/submit.py#L53

There are two different things going on here:

The user has a current course thing (defined by an LTI connection, jupyterhub config, or some other means)
The exchange can be asked for details on a course, any course. The exchange needs to check that the user making the request has access to the course being asked about - which may not be the current course.... this allows the exchange to handle users subscribed to multiple courses

perllaghu commented 4 years ago

Currently, the exchange service does not provide a mean to share nbgrader's grade database among several instructors, right?

Correct - the instructors nbgrader database (the one used by formgrader) is still the sqlite database in the instructors home directory.

I can definitely see a piece of work to move the formgrader database to a central database - which would immediately allow multiple instructors to manage a single course.... but there are a whole raft of things to work through for that:

Where to released and generated notebooks live?
Where's the [autho]grading done?
Is there a distinction between Instructor and TeachingAssistant?

perllaghu commented 1 year ago

I believe we can close this.

In response to:

Currently, the exchange service does not provide a mean to share nbgrader's grade database among several instructors, right?

We have the following solution:

Each course gets it's own database in a central database server, and a directory on a central FileStore server.

When an instructor starts their notebook server in our system, the database URL is calculated & set for that course, and the directory in the central FileStore is mounted. We also set c.CourseDirectory.root to a path specific for that course.

Thus all instructors have access to the same database, and the same course files: source, release, submitted, autograded, feedback.

.... Oh, and we found it useful to set c.CourseDirectory.directory_structure = '{nbgrader_step}/{assignment_id}/{student_id}' - but that's just us....

[How 10 markers manage the 200 submissions is not in the solution: they all see the same dataset]

jupyter / nbgrader