jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
615 stars 221 forks source link

Jupyter Kernel Restart does not release the RAM usage by the kernel running in JEG kubernetes cluster #1195

Open sharmasw opened 1 year ago

sharmasw commented 1 year ago

Description

We have a JEG running in the Kubernetes cluster when we spawn a pod to execute a jupyter notebook, everything works well, but when the user restarts the kernel the RAM of the kernel does not get released by the pod immediately. We either have to wait for an indefinite time for it to get released, if we continue using it, eventually it goes out of memory and Kubernetes kills the pod.

Screenshots / Logs

Start of the Kernel: image

After executing some commands: image

1st restart: image

immediately 2nd restart without executing any code: image

3rd restart without executing any code: image

Now If we wait for some indefinite time (for this example it took 4 minutes) and it will release the memory: image image image

Any clue or suggestion as to why this behavior, we just want to release all the RAM utilized post restart action is performed.

Environment

Resource configuration

welcome[bot] commented 1 year ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

kevin-bates commented 1 year ago

Hi @sharmasw - I'm not familiar with how/when resources are deallocated surrounding a pod's lifecycle. I guess the information you provide is not too surprising. When kernels are restarted, we retain the namespace and give the new pod the same name as the previous (because the kernel_id is also preserved). I imagine this might be why k8s defers the cleanup that you observe, and might transfer the resources to the new pod provided it's on the same node as the previous.

Can you share your resource configuration in case others want to look into this? Are these specified as limits or requests, and via envs, or just configured directly into the pod's launch script?

Does anyone else know how resources are deallocated in k8s? @lresende, @rahul26goyal

If we can make that determination, we can possibly update KubernetesProcessProxy.terminate_container_resources() to explicitly deallocate resources on shutdowns.

rahul26goyal commented 1 year ago

Hi @sharmasw Can you please share the kernel type .. is it a custom kernel ? I see that the pod name has not changed across restarts which is unlikely for the kernels which EG supports today. As @kevin-bates mentioned, we kill the existing kernel pod and create a new one when you restart a kernel. Please correct me if I have misunderstood anything here.

kevin-bates commented 1 year ago

I see that the pod name has not changed across restarts which is unlikely for the kernels which EG supports today.

Pod names are preserved across restarts. By default, they are composed of kernel username and kernel id, both of which are static values in this context.

sharmasw commented 1 year ago

If we can make that determination, we can possibly update KubernetesProcessProxy.terminate_container_resources() to explicitly deallocate resources on shutdowns.

Hi @kevin-bates could you elaborate on what could actually be done for explicitly deallocating the resource? We looked into the Kubernetes python library and did not find any documentation or function that talks about deallocating unused resources from a given pod.

kevin-bates commented 1 year ago

Hi @sharmasw - well, I'm afraid you answered the question. If the API does not expose a means to deallocate resources sooner, I'm not sure there's much we can do. Had there been a way to address this via the API, we could introduce those calls into KubernetesProcessProxy.terminate_container_resources().

This behavior implies that resources may be indexed by pod name (and probably namespace) - which seems very odd. I just confirmed that the Docker container ID changes across restarts - so it's definitely a different instance.

Can you share your resource configuration in case others want to look into this? Are these specified as limits or requests, and via envs, or just configured directly into the pod's launch script?

Since the pod name (and namespace) are the same, perhaps the resources are treated as high-water marks or something. (This is definitely the kind of thing that is difficult to locate w/o knowing the code or how the scheduler works as its probably not an ordinary use-case.)