sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.17k stars 216 forks source link

Add a new cell type to Jupyter notebooks that's an "encrypted cell" #5070

Closed williamstein closed 7 months ago

williamstein commented 3 years ago

This would be an extension to Jupyter, which would be meant to be used in the context of nbgrader and a cocalc course, at least initially. The problem it solves is providing a way for students to solve problems and get immediate feedback, but without students seeing the code that determines that feedback.

When the students use a notebook with an encrypted cell, they can't see the code in that cell, but they can evaluate the cell and see the output. Behind the scenes, the cocalc project would know the decryption key and be able to run the code, but the student would never have access to the key. A big part of the problem here is how the project gets access to the decryption key without the student being able to gain access to that key.

A solution that is not 100% bulletproof from a crypto point of view may be fine, given the application. That said, I think this can be done via our local_hub node.js process.

Basically, this could be a new primitive for Jupyter notebooks that you can think of as "cells that students can run but they can't see the code". nbgrader and our course management system would make use of it. Maybe something else would later too.

realtimclemans commented 3 years ago

I want this. I'm very annoyed with the current online training programs where quizzes are after a ton of information is presented. I want to present a piece of information, then hide it, then quiz on it, and be automatically graded.

jasongrout commented 3 years ago

We've thought about how to have notebook cells that the user could run but not modify, or run but not see. Once the notebook state is on the server side, this becomes much easier - just do not send the cell contents to the frontend. Instead, the cell has metadata telling the frontend that there is content in the cell, and it can request the server execute the cell contents, but it does not know what the cell contents is. This both hides the code from the user, and ensures that the user does not tamper with the code.

jasongrout commented 3 years ago

Of course, the idea I posted above is harder if you have real-time collaboration between the frontend and backend that ensures that they have the same notebook data, since this introduces an asymmetry in the data each side holds.

williamstein commented 3 years ago

@jasongrout thanks. For us the problem is a bit different due to our security model. The user has full access to the filesystem of the project (=Docker container) where their Jupyter server is running. However, they (presumably?) don't have access to the contents of all memory of that server. Our model is that our central hubs connect to the Jupyter server (a node.js process), not the other way around, i.e., even being able to run arbitrary code in the project doesn't let you make an outgoing connection.


Central Hub ---------------------------------------->  Node.js Jupyter server (the local hub)

                                                                                   "heh, give me a secret key to unlock this notebook's content"

"here is the secret key for that notebook, which I'm 
giving you because you're a special project in a course"

After the above happens, the Jupyter server would then be able to decrypt and run any code in the notebook. However, the user (who can see any file contents, processes, etc.,) can't get at that decrypted code unless they can somehow access the internal memory of the node.js process. I think they can't access that memory because nothing in the container is running as root and the permissions of /proc are setup properly for us (or can be). There's a discussion here.

And just to be clear, I am worried about how me just implementing this might potentially conflict with or confuse "the Jupyter ecosystem". If nothing else, I'll try to make it very clear to users that this is a cocalc-specific extension, and what the caveats are...

williamstein commented 3 years ago

Also I want to be clear that this is also @haraldschilly 's idea, not mine. I was trying to do something different to solve this problem that was much more complicated and less capable (involving probably another docker container and process). This "encrypted cells" approach is clearly the most powerful by far if we can make it work. It'll also automatically work for any kernel (not just Python, say), which is important. And it gets the full state of that kernel, which is also important.

jasongrout commented 3 years ago

The user has full access to the filesystem of the project (=Docker container) where their Jupyter server is running.

Oh, right. Yes, that is different.

Once that code is executed, will the user be able to look in the Jupyter execution history to see what was run? I.e., look in In[-2].

jasongrout commented 3 years ago

I believe you can execute code with the silent parameter to prevent the execution from being stored in the execution history. See the kernel protocol execute_request parameters for details.

jasongrout commented 3 years ago

It looks like you probably would want store_history to be false, but may want silent to be the default true (you probably want output from the cell to be visible): https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute

williamstein commented 3 years ago

I.e., look in In[-2].

You probably just saved me a lot of pain, because I wouldn't have thought about that, some students would have, etc., etc. Thanks!

novoselt commented 7 months ago

It does not seem to be popular in requests and I think we should not deviate from "standard notebooks" without a really good reason. My prediction is that if this feature is implemented, it will get very little usage because people do not know about such stuff. Hence let's focus on other things ;-)

williamstein commented 7 months ago

Agreed.