jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
615 stars 221 forks source link

delete PATH env conflict with container #1225

Closed jeremythuon closed 1 year ago

jeremythuon commented 1 year ago

Hello

I found problem with custom container. Since pull https://github.com/jupyter-server/enterprise_gateway/pull/1164, Kubernetes container automatically inherits the variables from flow. With this, custom containers have their PATH env overwrite. To avoid this, you must remove the modification of this variable before the creation of the container.

I found where PATH is set : https://github.com/jupyter-server/enterprise_gateway/blob/7901422c1e99881f449d0cacba514af08e410c48/enterprise_gateway/services/kernels/handlers.py#L57-L63

I don't why it's needed, but it's more secure to delete it just for container

kevin-bates commented 1 year ago

Hi @jeremythuon - thanks for opening this PR. I agree this is an issue, but I think we should use a different resolution. The issue with this solution is that users can specify the env for their kernels via the kernel.json's env: stanza. As a result, we should not preclude PATH entries in that stanza.

Instead, I think you make a very valid statement in your opening description: "I don't why it's needed"

The purpose of the inherited_envs configurable is to specify which envs of the EG process to transfer to the kernel. I don't recall why we transferred PATH unconditionally, but, since practically all kernels are remote from the EG server, the EG process's PATH stands a good chance of not being correct relative to the target host - regardless of launch type. So, I think we should remove this code from the handlers.py file, initialize env to an empty dictionary, and let folks configure inherited_envs to include PATH if they need the EG's PATH inherited by the kernel process.

Just a note... It can be a little confusing but the references to kwargs["env"] in launch_process represent the accumulation of the env entries from the client, those inherited from the EG process, and those specified in the kernel.json, so had the kernel.json included a PATH entry, this PR would have removed that (arguably correct) env from the kernel's process, which we don't want.

Could you please explore the other approach to tackling this from the handlers.py file? I'd rather not introduce an env-based list of variables to deal with unless we find we have to. Thank you.