jupyterhub / nbgitpuller

Jupyter server extension to sync a git repository one-way to a local path
https://nbgitpuller.readthedocs.io
BSD 3-Clause "New" or "Revised" License
210 stars 86 forks source link

support running .nbgitpuller.script.{init,update} on start #291

Open jimdigriz opened 1 year ago

jimdigriz commented 1 year ago

Proposal for fixing https://github.com/jupyterhub/nbgitpuller/issues/122

.nbgitpuller.script.{init,update} placed at the top of the repo contains a list of shell commands to run when the project is checked out or refreshed respectively.

Let me know what you think, and if this has legs I can add some spit, polish and documentation to this.

welcome[bot] commented 1 year ago

Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly. welcome You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

yuvipanda commented 1 year ago

Ah, this is an interesting idea with fun security implications (see autorun.inf) :D I would say we can probably do this in the following cases:

  1. It should be off by default, and the jupyterhub admin would need to turn it on (perhaps via traitlet / env var)
  2. The person making the link should also opt-in, with an extra param required in the URL
  3. Display a clear message that this is executing in the progress UI
  4. Don't make it a hidden file - I think it's important for end users to know this is happening too.

I think under these conditions, I would be thrilled to add this functionality!

jimdigriz commented 1 year ago

Sounds reasonable, I'll take a stab at this over the next few weeks when I find some time and get back to you. Thanks for the fast feedback and suggestions!

jimdigriz commented 1 year ago
  1. Don't make it a hidden file - I think it's important for end users to know this is happening too.

@yuvipanda Would you be willing to drop this requirement?

Asking as (for us) it is also important to not litter the project with files students do not need to interact with or be aware of; we already have .hidden directory containing various utilities that are run by our notebooks in our git projects.

In the case of a bad actor, this countermeasure would be trivially neutered with an rm "$0"; git --rewrite-history ... at the top of the script to hide themselves so not sure it actually adds any real value.

For visibility I think this would be covered by (3).

What do you think?

Just negotiating here, as if I don't ask... :)

yuvipanda commented 1 year ago

I think I'd like the default to not be hidden, but can be allowed to be hidden with a server-side setting via a traitlet. That way, admins can opt-in to it being hidden, but by default it's visible. I think AUTORUN.INF and similar things on Windows have scarred me enough that I don't want automatic code execution without full visibility :)

jimdigriz commented 1 year ago

I think I'd like the default to not be hidden, but can be allowed to be hidden with a server-side setting via a traitlet.

Works for Me(tm)

Thanks

jimdigriz commented 1 year ago

Trying to understand the actual risk here. I acutely understand the 'security' problem and the proposed solutions that could be applied to mitigate this, but struggling to understand how this ultimately applies to the environments it will be deployed to.

Understand, I am outlining my assumptions here and my reasoning, not dictating this is how this should be.

My understanding for the majority of JupyterHub deployments the user is not the administrator of the deployment. The user is choosing to use the service offered in the same way they choose to use an educational institution or company provided workstation. The user is not the decision maker on what runs on those environments. Right?

Should we also be considering extending these proposals to users attempting to run pip install ...?

My aim is to provide an option for a course author to use nbgitpuller for their deployment of JupyterHub where they need a bootstrapping script to be executed to get things working. For my platform, we have tried clear instructions, but for some reason people still keep tripping up and making a mess of cut'n'pasting. The only solution I see here is that the service solves this for them...somehow.

I find it hard to believe that a user is going to be generating nbgitpuller links and passing them around when they can also ask people to type git clone .... A bad actor could send a email with the link in it, but what is at risk here? The users home directory on a learning platform? Unencrypted SSH/GPG private keys? A map showing the location of the loot?

I think @yuvipanda's suggestion of placing this behind an administrator flag (disabled by default) is prudent. Asking the user to make a choice on a platform as a guest is peculiar, especially when those same users struggle with "please cut'n'paste verbatim...".

Did no one else suffer Windows 7 and those security popups? :)

Maybe what is needed here is to make the administrative flag behave more like c.Authenticator.username_pattern but holding a list of URI regexp's on when to allow use of the bootstrapping script otherwise it just fetches the repo as usual.

yuvipanda commented 5 months ago

I don't think it's necessary for the end user to make the choice here. In https://github.com/jupyterhub/nbgitpuller/pull/291#issuecomment-1478243302 I described this as being behind a traitlet flag, that can be enabled by JupyterHub admin (or end user, if they're using it without JupyterHub). I think that's the right place for this security setup. It should be opt-in as well.

So I'm happy to accept this with the following security requirements:

  1. It's disabled by default
  2. Requires explicit action by someone who can modify jupyter server config (so JupyterHub admin or end user) to enable this feature