support dynamic load new kernel spec

Problem

for now, we should restart eg after install a new kernel spec and it is not convenient. cuz other users cannot connect to gateway due to restarting eg.

Proposed Solution

only support installing new kernel specs. elder kernel in used are not support to escape running problem.

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

Hi @wuyueandrew - it is not typically necessary to restart EG (or any jupyter server) to use additional kernelspecs added after the server has started. When kernelspecs are requested via, for example, curl -X GET -i http://localhost:8888/api/kernelspecs, the server will hit the filesystem to locate any new kernel specifications, and return all that it finds.

However, if this is a Kubernetes deployment, we are using the KernelSpecManager.whitelist configurable (set via env EG_KERNEL_WHITELIST) which would require the need to restart and redeploy EG.

If this is not a containerized env, then you could get the dynamic behavior you need by either not setting KernelSpecManager.whitelist and getting the default (dynamic) behavior, or by configuring it within the jupyter_enterprise_gateway_config.py configuration file and setting EnterpriseGatewayApp.c.EnterpriseGatewayConfigMixin.dynamic_config_interval to a non-zero (and positive) value. In this case, updates to the config file are then detected and reflected in the subsequent kernelspec fetch request.

If you're using a containerized environment, then, yes, we should introduce a way to NOT set the whitelist, so that users can have the ability to add kernelspecs dynamically. Since the configuration file would live in the container, accessing would require mounts and an adjustment to the start-enterprise-gateway.sh script to NOT configure the whitelist via command-line arguments since CLI configurables are not privy to dynamic config detection.

If K8s, perhaps we could allow for the entry in values.yaml to be:

whitelist:
  - *

which will then set EG_KERNEL_WHITELIST=* and then perhaps in start-enterprise-gateway.sh we could detect a value of * to mean don't set the KernelSpecManager.whitelist entry?

How often are you creating kernelspecs? (Just curious)

Hi @wuyueandrew - it is not typically necessary to restart EG (or any jupyter server) to use additional kernelspecs added after the server has started. When kernelspecs are requested via, for example, curl -X GET -i http://localhost:8888/api/kernelspecs, the server will hit the filesystem to locate any new kernel specifications, and return all that it finds.

However, if this is a Kubernetes deployment, we are using the KernelSpecManager.whitelist configurable (set via env EG_KERNEL_WHITELIST) which would require the need to restart and redeploy EG.

If this is not a containerized env, then you could get the dynamic behavior you need by either not setting KernelSpecManager.whitelist and getting the default (dynamic) behavior, or by configuring it within the jupyter_enterprise_gateway_config.py configuration file and setting EnterpriseGatewayApp.c.EnterpriseGatewayConfigMixin.dynamic_config_interval to a non-zero (and positive) value. In this case, updates to the config file are then detected and reflected in the subsequent kernelspec fetch request.

If you're using a containerized environment, then, yes, we should introduce a way to NOT set the whitelist, so that users can have the ability to add kernelspecs dynamically. Since the configuration file would live in the container, accessing would require mounts and an adjustment to the start-enterprise-gateway.sh script to NOT configure the whitelist via command-line arguments since CLI configurables are not privy to dynamic config detection.

If K8s, perhaps we could allow for the entry in values.yaml to be:
whitelist:
  - *
which will then set EG_KERNEL_WHITELIST=* and then perhaps in start-enterprise-gateway.sh we could detect a value of * to mean don't set the KernelSpecManager.whitelist entry?

How often are you creating kernelspecs? (Just curious)

thanks kevin, u answer is helpful, i'll try ur solution later. Typically, my colleagues design kernel spec(python env and some wheel) for a custome scenario and most of default kernels can cover scenario. The frenquence of creating kernelspecs depends on our customers.So far, we don't create kernelspecs often.

Hi @wuyueandrew - it is not typically necessary to restart EG (or any jupyter server) to use additional kernelspecs added after the server has started. When kernelspecs are requested via, for example, curl -X GET -i http://localhost:8888/api/kernelspecs, the server will hit the filesystem to locate any new kernel specifications, and return all that it finds. However, if this is a Kubernetes deployment, we are using the KernelSpecManager.whitelist configurable (set via env EG_KERNEL_WHITELIST) which would require the need to restart and redeploy EG. If this is not a containerized env, then you could get the dynamic behavior you need by either not setting KernelSpecManager.whitelist and getting the default (dynamic) behavior, or by configuring it within the jupyter_enterprise_gateway_config.py configuration file and setting EnterpriseGatewayApp.c.EnterpriseGatewayConfigMixin.dynamic_config_interval to a non-zero (and positive) value. In this case, updates to the config file are then detected and reflected in the subsequent kernelspec fetch request. If you're using a containerized environment, then, yes, we should introduce a way to NOT set the whitelist, so that users can have the ability to add kernelspecs dynamically. Since the configuration file would live in the container, accessing would require mounts and an adjustment to the start-enterprise-gateway.sh script to NOT configure the whitelist via command-line arguments since CLI configurables are not privy to dynamic config detection. If K8s, perhaps we could allow for the entry in values.yaml to be:
whitelist:
  - *
which will then set EG_KERNEL_WHITELIST=* and then perhaps in start-enterprise-gateway.sh we could detect a value of * to mean don't set the KernelSpecManager.whitelist entry? How often are you creating kernelspecs? (Just curious)
thanks kevin, u answer is helpful, i'll try ur solution later. Typically, my colleagues design kernel spec(python env and some wheel) for a custome scenario and most of default kernels can cover scenario. The frenquence of creating kernelspecs depends on our customers.So far, we don't create kernelspecs often.

Hi, kevin, thanks again. After try ur solution, it cannot solve. I used helm to deployment eg on k8s. So maybe i misunderstood. Is kernel.whitelist=* is not support yet , and the only way of solving this is not deploy on k8s.

Is kernel.whitelist=* is not support yet

Correct. Support to recognize * is not currently supported.

the only way of solving this is not deploy on k8s

Because we're setting the CLI options in the start-enterprise-gateway.sh script, we'd need to move these settings into a configuration file, then we could use the dynamic configuration support. This would require two additional pieces of configuration.

The configuration file's location would need to be on a mounted volume so that it could be accessed from outside the pod and updated to include any additional kernel specs.
The location of the kernelspec directories (those under /usr/local/share/jupyter/kernels) also need to be mounted to enable updates from outside the pod. (I'm assuming you're already doing this.)

Using helm to redeploy EG with a new kernelspec will also not work because it restarts EG.

Also note that if we were to enable support for *, the following kernelspecs would be automatically exposed since we include all supported kernelspecs in the EG image:

dask_python_yarn_remote, spark_python_kubernetes, python_distributed, spark_python_operator, python_docker, spark_python_yarn_client, python_kubernetes, spark_python_yarn_cluster, python_tf_docker, spark_R_conductor_cluster, python_tf_gpu_docker, spark_R_kubernetes, python_tf_gpu_kubernetes, spark_R_yarn_client, python_tf_kubernetes, spark_R_yarn_cluster, R_docker, spark_scala_conductor_cluster, R_kubernetes, spark_scala_kubernetes, scala_docker, spark_scala_yarn_client, scala_kubernetes, spark_scala_yarn_cluster, spark_python_conductor_cluster

As a result, you'd probably want to remove a majority of these since they won't apply to your environment.

So, I see there being two ways forward to support dynamic kernelspecs on K8s:

Add support for kernel.whitelist = *. When specified, this will result in the EG_KERNEL_WHITELIST env == * which will instruct start-enterprise-gateway.sh to NOT set the KernelSpecManager.whitelist configurable. However, access to /usr/local/share/jupyter/kernels must still be available outside the EG pod. This option, because it's using the default logic of taking whatever kernelspecs are present, would require that you remove whatever kernelspecs do not apply to your environment.
Move the current configuration of KernelSpecManager.whitelist from the CLI options to a configuration file and enable dynamic_config_interval to a non-zero value. This requires two mount locations - one for the kernelspecs (same as option 1), and the other for the configuration file. This option, because it still uses a whitelist, can tolerate the existence of the unrelated kernelspecs.

In both cases, you would NOT use helm to redeploy EG in order to add a kernelspec. You would merely update the appropriate location(s) with the new kernelspec.

I think option 2 would be more difficult on an operator/admin than option 1, so would lean toward implementing option 1. I can look into this later today or tomorrow if that option would work for you.

Is kernel.whitelist=* is not support yet

Correct. Support to recognize * is not currently supported.

the only way of solving this is not deploy on k8s

Because we're setting the CLI options in the start-enterprise-gateway.sh script, we'd need to move these settings into a configuration file, then we could use the dynamic configuration support. This would require two additional pieces of configuration.

The configuration file's location would need to be on a mounted volume so that it could be accessed from outside the pod and updated to include any additional kernel specs.

The location of the kernelspec directories (those under /usr/local/share/jupyter/kernels) also need to be mounted to enable updates from outside the pod. (I'm assuming you're already doing this.)

Using helm to redeploy EG with a new kernelspec will also not work because it restarts EG.

Also note that if we were to enable support for *, the following kernelspecs would be automatically exposed since we include all supported kernelspecs in the EG image:

dask_python_yarn_remote, spark_python_kubernetes, python_distributed, spark_python_operator, python_docker, spark_python_yarn_client, python_kubernetes, spark_python_yarn_cluster, python_tf_docker, spark_R_conductor_cluster, python_tf_gpu_docker, spark_R_kubernetes, python_tf_gpu_kubernetes, spark_R_yarn_client, python_tf_kubernetes, spark_R_yarn_cluster, R_docker, spark_scala_conductor_cluster, R_kubernetes, spark_scala_kubernetes, scala_docker, spark_scala_yarn_client, scala_kubernetes, spark_scala_yarn_cluster, spark_python_conductor_cluster

As a result, you'd probably want to remove a majority of these since they won't apply to your environment.

So, I see there being two ways forward to support dynamic kernelspecs on K8s:

Add support for kernel.whitelist = *. When specified, this will result in the EG_KERNEL_WHITELIST env == * which will instruct start-enterprise-gateway.sh to NOT set the KernelSpecManager.whitelist configurable. However, access to /usr/local/share/jupyter/kernels must still be available outside the EG pod. This option, because it's using the default logic of taking whatever kernelspecs are present, would require that you remove whatever kernelspecs do not apply to your environment.

Move the current configuration of KernelSpecManager.whitelist from the CLI options to a configuration file and enable dynamic_config_interval to a non-zero value. This requires two mount locations - one for the kernelspecs (same as option 1), and the other for the configuration file. This option, because it still uses a whitelist, can tolerate the existence of the unrelated kernelspecs.

In both cases, you would NOT use helm to redeploy EG in order to add a kernelspec. You would merely update the appropriate location(s) with the new kernelspec.

I think option 2 would be more difficult on an operator/admin than option 1, so would lean toward implementing option 1. I can look into this later today or tomorrow if that option would work for you.

Apprecicate for u patient answer , very kind. My current solution is edit values.yaml in helm, add a new kernelspec item. eg is started by start-enterprise-gateway.sh so it's not dynamic. About option 1, it's easy to set, maybe only remove --KernelSpecManager.whitelist in start-enterprise-gateway.sh can achieve dynamic kernelspecs(i guess, i don't check the eg code). I don't worry about Unrelated kernelspecs, there are several ways to avoid it(such as, remove some pvc directory or ignore it). And option 2, it's dynamic but user still should edit conf file after add a new kernelspec. And it has a dynamic_config_interval delay. i think it's not convinent. Maybe Option1 is better.

Hi @wuyueandrew. Please see #1131. The approach I took is for the operator to remove or comment out the whitelist: entries in the values.yaml file, leaving only whitelist:. This will set a value of null in the EG_KERNEL_WHITELIST, instructing the start-enterprise-gateway.sh script to not add the CLI option. With this approach, Docker operators can get equivalent results using -e EG_KERNEL_WHITELIST=null when deploying the EG image.

Thank you. I'm look forward to V3.0

jupyter-server / enterprise_gateway

support dynamic load new kernel spec #1127

Problem

Proposed Solution