jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

Rethinking environment handling #160

Open rkdarst opened 5 years ago

rkdarst commented 5 years ago

I think we need to rethink environment handling:

Currently there is req_keepvars which sets {{keepvars}} in the batch script to a comma separated list, but does not add them to the environment itself. The environment which is gotten from .get_env() which comes from JupyterHub, and JupyterHub gives only what is needed.

Problems:

I am working on rewrite (started from #119 long ago) to improve things. But it's become clear that we need a new mental model of how the variables even work.

Here is what I propose:

What do you think?

mbmilligan commented 5 years ago

Right. I think we ended up getting these mental models muddled together when adding req_keepvars. What is actually does is very limited: for batch systems that support this, it names environment variables that should be propagated from the submission environment into the execution environment. Without additional code, that's only useful for getting the JHub API key into the server session. But there seem to be various people hacking in similar things to pass through other kinds of access tokens.

I think your mental model outlined makes sense, but can we get some contributors to throw in concrete use cases so we're not just talking abstractly? @cmd-ntrf ? @zonca ? @jupyterhub/designers ?

zonca commented 5 years ago

I think this is a good idea. I always had trouble getting the right environment into Comet, last time I deployed I worked around this explicitly propagating the environment via ssh, see https://gist.github.com/zonca/55f7949983e56088186e99db53548ded#file-spawner-py-L55 (from https://zonca.github.io/2018/11/jupyterhub-supercomputer.html)

rkdarst commented 5 years ago

I implemented this in resurrected PR #119

It's a bit annoying that this changes semantics yet again, but I think it's better to get it done before the release.

Next up: we could add all_envs to the subvars which is the raw dictionary. Then, one can do something in the batch script like this (but we should not recommend this, storing the API token in the batch script is not secure with some batch systems!):

{% for key, value in all_env.items() %}
export {{key}}={{value}}
{% endfor %}

... maybe for security this just shouldn't be added, since it will just lead to people doing the above - or something similar like leaking tokens by /proc/*/cmdline...