Closed blair-anson closed 1 year ago
hi @blair-anson. We don't currently flow the user's home directory, only the working directory - which all appears to be working. I don't think we can assume that just because KERNEL_WORKING_DIR
has a value, that value is the user's HOME directory, so I think we'd need another vehicle to convey this information and notice that you created KERNEL_USER_HOME
.
Although this would only apply to containerized kernels (docker, k8s), I wonder if we could honor KERNEL_USER_HOME
such that the container launcher scripts do something like env['HOME'] = os.getenv('KERNEL_USER_HOME', '/home/jovyan')
?
To address this in 2.6, you could modify either launch_kubernetes.py or kernel-pod.yaml.j2 to set HOME accordingly for your scenario.
Once you have that working, a contribution would be appreciated. Seems like the semantics could also be that if KERNEL_USER_HOME
is set, that implies a KERNEL_WORKING_DIR
value (if not set), whereas we can't say the same in the other direction. And, in both cases, I believe we'd need to say that EG_MIRROR_WORKING_DIRS
is required.
Regarding the ENV_WHITELIST
setting, this is really strange and I don't know where that would be coming from - it's not coming from anything in github - so much be configured within your EG deployment scripts. The env-whitelist (renamed in 3.0 to EG_CLIENT_ENVS
) specifies the list of environment variable names that are allowed to flow from the client (Jupyter Lab). By also setting it the way you have in the kernelspec, the kernel will be launched with an empty PYTHONPATH
, so probably not what you intended. If you wanted the PYTHONPATH
value to flow from the client (which, based on the spawner variables, is the intention), you'd either want to remove that entry (since EG_ENV_WHITELIST will ensure it flows (assuming the client is actually setting it in the first place)) or update your entry to:
"env": {
"PYTHONPATH": "${PYTHONPATH}"
}
Also note that to ensure this env flows from the Lab client, the lab configuration should set JUPYTER_GATEWAY_ENV_WHITELIST
, otherwise the gateway integration won't include PYTHONPATH
in the kernel start request payload. (Note that this env/configuration option has been renamed to JUPYTER_GATEWAY_ALLOWED_ENVS
in Jupyter Server 2.0.). What version of Lab are you using and do you know if its using Notebook or Jupyter Server as its web server?
Hi @kevin-bates thank you for the comprehensive and very helpful response.
We don't currently flow the user's home directory, only the working directory
Ah that explains why I wasn't getting anywhere. No worries though, I can customise JEG as per your suggestions. Passing the environment variables was blocking me from progressing with the customisation but with your answer I should be able to progress. I'll write another comment here with my progress.
Regarding the ENV_WHITELIST setting, this is really strange and I don't know where that would be coming from - it's not coming from anything in github - so much be configured within your EG deployment scripts
Apologies that came from me Googling for how to set custom env variables that don't begin with KERNEL_
.
EG_ENV_WHITELIST
was in various github issues and some older documentation, so I thought I'd try it as the other settings weren't working.
KG_ENV_WHITELIST
was from here
https://jupyter-enterprise-gateway.readthedocs.io/en/v2.6.0/config-options.html
Also note that to ensure this env flows from the Lab client, the lab configuration should set JUPYTER_GATEWAY_ENV_WHITELIST, otherwise the gateway integration won't include PYTHONPATH in the kernel start request payload. (Note that this env/configuration option has been renamed to JUPYTER_GATEWAY_ALLOWED_ENVS in Jupyter Server 2.0.). What version of Lab are you using and do you know if its using Notebook or Jupyter Server as its web server?
I am currently using JupyterLab v3.0.16 which I believe specifies these versions.
jupyterlab_server~=2.3
jupyter_server~=1.4
I actually run JupyterHub up on the server with the JupyterLab version I specified above (hence why I created my own spawner), but to simply things when trying out different configurations for JEG I run JupyterLab locally using a command like this. Again this is JupyterLab v3.0.16 but I have also been experimenting with v3.4. Thanks for the warning about the change of env whitelist name, I will keep it in mind when I upgrade JupyterLab in the future
KERNEL_UID=1001 \
KERNEL_GID=1001 \
KERNEL_USERNAME=user1 \
KERNEL_WORKING_DIR=/home/user1 \
KERNEL_VOLUMES="[{name: 'nfs-volume', nfs: {server: 'fs-xxxxx.efs.us-west-2.amazonaws.com', path: '/user1'}}]" \
KERNEL_VOLUME_MOUNTS="[{name: 'nfs-volume', mountPath: '/home/user1'}]" \
jupyter lab --gateway-url=https://xxxxxxx:8888 --GatewayClient.http_user=guest --GatewayClient.http_pwd=guest-password
I started on trying to pass a custom env variable to the kernel, and I am still stuck.
Instead of PYTHONPATH
I defined BLAIRENV
as a test, as that won't have any impact to python and allows me to test just the environment variable passing.
I remove the env variable from the kernel spec kernel.json
...
"env": {
}
...
In JEG I set these env variables. I originally tried just EG_ENV_WHITELIST
but then when that did not work I also added EG_CLIENT_ENVS
in case I mistaken about the JupyterServer version
deployment.yaml
...
- name: EG_CLIENT_ENVS
value: "BLAIRENV"
- name: EG_ENV_WHITELIST # renamed to EG_CLIENT_ENVS in JEG 3.0
value: "BLAIRENV"
...
In the JupyterLab spawner on JupyterHub I have these env variables set
env['BLAIRENV'] = "catsndogs"
env['JUPYTER_GATEWAY_ENV_WHITELIST'] = "BLAIRENV"
env['JUPYTER_GATEWAY_ALLOWED_ENVS'] = "BLAIRENV"
However after all that I still don't see the BLAIRENV
in the kernel. Any env prefixed with KERNEL_
do get passed through but BLAIRENV
does not.
Is there some other configuration I should try?
Yeah, I just tried this and see the same thing (using KEVINENV=42
). I can see it has flowed to the EG and is available to the kernel launch...
[D 2022-09-24 15:43:47.800 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'SHELL': '/bin/bash', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'EG_MIRROR_WORKING_DIRS': 'False', 'KUBERNETES_SERVICE_PORT': '443', 'ENTERPRISE_GATEWAY_PORT_8877_TCP': 'tcp://10.43.139.10:8877', 'EG_NAMESPACE': 'enterprise-gateway', 'ENTERPRISE_GATEWAY_SERVICE_PORT_HTTP': '8888', 'HOSTNAME': 'enterprise-gateway-6c8749c669-qrdtt', 'LANGUAGE': 'en_US.UTF-8', 'EG_SHARED_NAMESPACE': 'False', 'EG_PORT': '8888', 'EG_LOG_LEVEL': 'DEBUG', 'JAVA_HOME': '/usr/lib/jvm/java-8-openjdk-amd64', 'ENTERPRISE_GATEWAY_SERVICE_PORT_RESPONSE': '8877', 'EG_KERNEL_WHITELIST': '"r_kubernetes","python_kubernetes","python_tf_kubernetes","python_tf_gpu_kubernetes","scala_kubernetes","spark_r_kubernetes","spark_python_kubernetes","spark_scala_kubernetes","spark_python_operator"', 'NB_UID': '1000', 'ENTERPRISE_GATEWAY_SERVICE_HOST': '10.43.139.10', 'PWD': '/usr/local/bin', 'ENTERPRISE_GATEWAY_PORT_8877_TCP_PROTO': 'tcp', 'EG_CULL_IDLE_TIMEOUT': '3600', 'EG_DEFAULT_KERNEL_NAME': 'python_kubernetes', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_PORT': '8888', 'EG_ENABLE_TUNNELING': 'False', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_ADDR': '10.43.139.10', 'EG_KERNEL_LAUNCH_TIMEOUT': '60', 'HOME': '/home/jovyan', 'LANG': 'en_US.UTF-8', 'KUBERNETES_PORT_443_TCP': 'tcp://10.43.0.1:443', 'ENTERPRISE_GATEWAY_PORT_8877_TCP_PORT': '8877', 'EG_LIST_KERNELS': 'True', 'EG_SSH_PORT': '2122', 'NB_GID': '100', 'EG_RESPONSE_PORT': '8877', 'ENTERPRISE_GATEWAY_PORT_8888_TCP': 'tcp://10.43.139.10:8888', 'KG_PORT': '8888', 'EG_CULL_CONNECTED': 'False', 'EG_PORT_RETRIES': '0', 'KG_IP': '0.0.0.0', 'ENTERPRISE_GATEWAY_PORT_8877_TCP_ADDR': '10.43.139.10', 'EG_CULL_INTERVAL': '60', 'EG_IP': '0.0.0.0', 'SHLVL': '0', 'CONDA_DIR': '/opt/conda', 'ENTERPRISE_GATEWAY_SERVICE_PORT': '8888', 'SPARK_HOME': '/opt/spark', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'KG_PORT_RETRIES': '0', 'KUBERNETES_PORT_443_TCP_ADDR': '10.43.0.1', 'SPARK_VER': '2.4.6', 'ENTERPRISE_GATEWAY_PORT': 'tcp://10.43.139.10:8888', 'NB_USER': 'jovyan', 'KUBERNETES_SERVICE_HOST': '10.43.0.1', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_PROTO': 'tcp', 'LC_ALL': 'en_US.UTF-8', 'KUBERNETES_PORT': 'tcp://10.43.0.1:443', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'PATH': '/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'EG_ENV_WHITELIST': 'KEVINENV', 'EG_KERNEL_CLUSTER_ROLE': 'kernel-controller', 'DEBIAN_FRONTEND': 'noninteractive', 'KEVINENV': '42', 'KERNEL_LAUNCH_TIMEOUT': '40', 'KERNEL_USERNAME': 'jovyan', 'KERNEL_GATEWAY': '1', 'KERNEL_POD_NAME': 'jovyan-76aed5ed-02e7-468f-a3ee-84c154d444c9', 'KERNEL_SERVICE_ACCOUNT_NAME': 'default', 'KERNEL_NAMESPACE': 'jovyan-76aed5ed-02e7-468f-a3ee-84c154d444c9', 'KERNEL_IMAGE': 'elyra/kernel-py:2.6.0', 'KERNEL_EXECUTOR_IMAGE': 'elyra/kernel-py:2.6.0', 'KERNEL_UID': '1000', 'KERNEL_GID': '100', 'EG_MIN_PORT_RANGE_SIZE': '1000', 'EG_MAX_PORT_RANGE_RETRIES': '5', 'KERNEL_ID': '76aed5ed-02e7-468f-a3ee-84c154d444c9', 'KERNEL_LANGUAGE': 'python', 'EG_IMPERSONATION_ENABLED': 'False'}
but the launcher script and jinja template is only setting a fixed set of envs and only based on the keyword set - which doesn't include anything that is not KERNEL_
-prefixed.
I think the kubernetes launcher needs to post-process the env stanza of the generated k8s pod yaml and set the remaining envs - which in the k8s launch is lots of meaningless stuff (some of which I'm not sure would side-affect things).
This seems like an issue we should try to fix for 3.0 GA. It only applies to Kubernetes since the other process-proxies (including docker) don't go through a template.
Sorry for the inconvenience. If you have a fixed set of envs you'd like to flow, I suppose you could extend the keywords
in the launcher script and add entries of each env name in the jinja template - but that's a bit of a pain and it might be easier to just extend the env
stanza following the yaml's generation and implement the correct solution. Just remember to recognize any envs that might already be present.
Is this something you'd like to contribute to our 3.0 release?
Ok thank you for confirming it. I will take a look and see if I can get it working on my fork. If I do I will look see if I can contribute it to 3.0
Hi @blair-anson. Given that we're close to our 3.0 GA release, I've started looking into this. I hope that's okay with you.
I want to make sure that this code, now that the env will truly find its way into the kernel pod, doesn't have any side-effects.
I should have a PR soon - currently looking at CI issues that appear unrelated. I suspect the current failures are due to a third-party dependency update since it doesn't reproduce in my current env. Here's the branch if you're interested.
Given that we're close to our 3.0 GA release, I've started looking into this. I hope that's okay with you.
That's perfectly understandable. Although I did implement a fix in my codebase I have not had time to test it properly, let alone look at the 3.0 codebase. Thank you for being so proactive.
Closed via #1164
Not sure if this is a bug, or a lack of understanding on my part.
I have user impersonation setup, so these commands all work as expected in the notebook...
However these commands still show jovyan as being the user instead of user1...
Also I configure a custom
PYTHONPATH
env variable but it is not visible in the kernelConfiguration
values.yaml
(so EG_MIRROR_WORKING_DIRS is set)
deployment.yaml I also set a whitelist for a custom python path in
spec.template.spec.containers.env
kernel.json This may not be necessary but I put a placeholder for the whitelisted env variable in the kernel spec
JupyterLab Spawner env variables