jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
615 stars 221 forks source link

Issue setting environment variables in remote kernels using KERNEL_ prefix #1192

Closed debashis1982 closed 1 year ago

debashis1982 commented 1 year ago

Description

We are using enterprise gateway to spin up kernels in kubernetes. Enterprise Gateway version is 3.1.0 and is running on Kubernetes. We want to set custom environment variables in kernels while they are being created so that they are available when the kernel is ready to use.

I have followed Kernel Environment Variables — Jupyter Enterprise Gateway 3.1.0.dev0 documentation

While I am able to override existing KERNEL_ prefixed env vars like KERNEL_USERNAME and KERNEL_SPARK_CONTEXT_INIT_MODE, I am unable to create new environment var like KERNEL_USERID. This is how my create kernel API call looks like

{ \"kernel\": { \"name\": \"py_3.7\"},\"env\": {\"KERNEL_USERID\": \"myuserid\"} }

But when I check the environment variables in the newly created kernel, it is not there.

To make it work I had to add

- name: KERNEL_USERID
   value: "{{ kernel_userid }}"

in the kernel's kernel-pod.yaml.j2 file. Is this config necessary or is there a way I can get it to work by just having KERNEL_ prefixed environment variables set at the client side?

welcome[bot] commented 1 year ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

kevin-bates commented 1 year ago

Hi @debashis1982 - as I expected, I cannot reproduce this issue, so I think we'll need to go through the exercise of what you're exactly running and doing.

Here's what I did:

  1. I deployed the EG helm chart directly from the release asset location:
    $ helm  upgrade --install  enterprise-gateway https://github.com/jupyter-server/enterprise_gateway/releases/download/v3.0.0/jupyter_enterprise_gateway_helm-3.0.0.tar.gz --namespace enterprise-gateway 
    Release "enterprise-gateway" does not exist. Installing it now.
    NAME: enterprise-gateway
    LAST DEPLOYED: Wed Nov  9 11:53:55 2022
    NAMESPACE: enterprise-gateway
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
  2. I confirmed via the EG pod log that I'm running the expected version:
    
    $ kubectl logs -f deployment.apps/enterprise-gateway -n enterprise-gateway

[I 2022-11-09 19:53:59.167 EnterpriseGatewayApp] Jupyter Enterprise Gateway 3.0.0 is available at http://0.0.0.0:8888

3. On the server in which I'm going to run Jupyter Lab, I exported the desired `KERNEL_` env and launched Lab pointing at the EG service port ...
```bash
$ export KERNEL_FLOW_TEST=42
$ jupyter lab --debug --gateway-url=http://localhost:53433
  1. Started a kubernetes python kernel and grep'd the output of !env for KERNEL_ Screen Shot 2022-11-09 at 11 59 30 AM

As you can see KERNEL_FLOW_TEST (and its value 42) have flowed from the Lab instance, through the EG server, and to the kernel pod instance.

Could you please try these steps or provide the necessary information to better understand where the differences are?

debashis1982 commented 1 year ago

We are building enterprise gateway from source

  1. We run make dist to build the dist packages
  2. We build a EG docker image with a Dockerfile. A snippet of which looks like this
    
    ARG BASE_CONTAINER=jupyter/minimal-notebook:2022-01-24
    FROM $BASE_CONTAINER
    ...
    COPY dist/jupyter_enterprise_gateway-3.1.0.dev1-py3-none-any.whl /tmp/
    RUN pip install /tmp/jupyter_enterprise_gateway-3.1.0.dev1-py3-none-any.whl && \
    rm -f /tmp/jupyter_enterprise_gateway-3.1.0.dev1-py3-none-any.whl.whl

ADD dist/jupyter_enterprise_gateway_kernelspecs-3.1.0.dev1.tar.gz /usr/local/share/jupyter/kernels/ ADD dist/jupyter_enterprise_gateway_kernel_image_files-3.1.0.dev1.tar.gz /usr/local/bin/ ...

COPY ./supervisord/supervisord-launch.sh /etc/supervisor/ COPY ./supervisord/supervisord.conf /etc/supervisor/ RUN mkdir -p /var/log/supervisor

RUN chown root:root /etc/supervisor/supervisord-launch.sh && \ chown root:root /etc/supervisor/supervisord.conf && \ chmod 0755 /etc/supervisor/supervisord-launch.sh ... CMD /etc/supervisor/supervisord-launch.sh

As you can see we are launching EG using supervisor. There are also some other critical processes that we run besides enterprise gateway
3. Then I refer to the docker image built by the previous process in my vaules.yml (`etc/kubernetes/helm/enterprise-gateway/values.yaml`)
4. Then I helm install/upgrade it 

helm upgrade jupyter-e-gw -f etc/kubernetes/helm/enterprise-gateway/values.yaml dist/jupyter_enterprise_gateway_helm-3.1.0.dev1.tgz

Release "jupyter-e-gw" has been upgraded. Happy Helming! NAME: jupyter-e-gw LAST DEPLOYED: Tue Nov 8 16:54:25 2022 NAMESPACE: default STATUS: deployed REVISION: 9 TEST SUITE: None

That launches EG pod and in the logs for EG I can see the version

[D 221108 22:54:30 selector_events:59] Using selector: EpollSelector [I 2022-11-08 22:54:30.418 EnterpriseGatewayApp] Jupyter Enterprise Gateway 3.1.0.dev1 is available at http://0.0.0.0:8888

To test it, in a terminal 
1. I set a `KERNEL_` prefixed env var

export KERNEL_VARIABLE=123

2. Point it to the gateway endpoint

export KG_URL=http://127.0.0.1:65349/

3. Launch notebook

jupyter notebook --log-level DEBUG\ --NotebookApp.session_manager_class=nb2kg.managers.SessionManager \ --NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager \ --NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager \ --KernelSpecManager.ensure_native_kernel=False

4. It launches a notebook. When I choose a kernel for a notebook file, a kernel pod is launched (say pod name is `e547c221-a331-4f5d-a3b4-f5422d59c0b9`) as expected and in there if print all env vars, I do not see the env var `KERNEL_VARIABLE`. Also if I try to view the env var in the pod directly like 

kubectl exec -it e547c221-a331-4f5d-a3b4-f5422d59c0b9 env | grep KERNEL_

It returns other KERNEL_ prefixed env vars except `KERNEL_VARIABLE` that I am expecting 

KERNEL_NAMESPACE=default KERNEL_USERNAME=root KERNEL_NAME=py_3.7 KERNEL_ID=e547c221-a331-4f5d-a3b4-f5422d59c0b9 KERNEL_LANGUAGE=python KERNEL_SPARK_CONTEXT_INIT_MODE=none


Also as I pointed out in the original issue even if try to create a kernel using a API call and in the process try to set environment variable, it does not work.

In both cases it works only if I add that variable in the kernel pod yaml's env variable list
kevin-bates commented 1 year ago

Ok - two immediate observations that should be updated.

  1. jupyter notebook should not be used. It can be used, but jupyter_server is where the focus is and jupyter notebook is essentially deprecated. Its v7 release will be a version of jupyter lab running on jupyter_server.
  2. You should definitely not be using the nb2kg server extension. That has been "obsolete" since Notebook v6. Please replace all of the nb2kg options and add a single option --gateway-url=http://127.0.0.1:65349/ or change the name of KG_URL to JUPYTER_GATEWAY_URL.

Here's the applicable section in the docs.

The build you're doing appears to follow our repo so that looks okay, but I suspect the envs are not getting to the EG server from the notebook server.

All of the KERNEL_ variables you list are generated by EG - so nothing of that set is coming from the client. (You probably shouldn't be running as root either - but that's your call. )

kevin-bates commented 1 year ago

Another thing to confirm is that your code contains the changes from https://github.com/jupyter-server/enterprise_gateway/commit/4f4e6dee14eef2e1aefd24285d28ac16d1448c0c.

Could you run the command equivalent to the following and confirm the existence of the extend_pod_env() method?

$ kubectl exec pod/enterprise-gateway-9f8c6dd9-s4lkd -n enterprise-gateway -- cat /usr/local/share/jupyter/kernels/python_kubernetes/scripts/launch_kubernetes.py

I just went and looked at nb2kg and it does transfer the KERNEL_ variables, so now I'm wondering if it's some kind of build issue. Do you make clean before building? Sometimes I've found stale build artifacts. Also, I notice you're on revision 9 of your helm deployment. I've found that subsequent "upgrades" don't work as expected, so it might be worthwhile removing the deployment and re-deploying it as version 1.

debashis1982 commented 1 year ago

I don't see that method there. We deploy the kernelspecs and launch scripts through a different process. Looks like we need to then update the launch script and retry. I will do that and get back to you. Thank you!

debashis1982 commented 1 year ago

That was it! That did it! Updating the kernel launch script was all that was needed. Thank you for all your help @kevin-bates .Really appreciate it.

kevin-bates commented 1 year ago

Awesome. Glad to have you moving forward!

Since you're developing with EG, it would be great to hear how we could help in that area, etc.