jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
623 stars 222 forks source link

Singularity Support ? #777

Open cceyda opened 4 years ago

cceyda commented 4 years ago

Are there any plans to support singularity containers? ...maybe it is already possible to use singularity containers along with the Kubernetes integration, has anyone tried this?

I'm currently trying to make a simple singularity container launcher based on this info: https://jupyter-enterprise-gateway.readthedocs.io/en/latest/system-architecture.html#extending-enterprise-gateway

Still I would prefer if it was possible with the Kubernetes integration

kevin-bates commented 4 years ago

This is the first I've heard about Singularity containers. It seems like we should support these given their use in research environments. However, unless someone can contribute this, I'm not sure it will happen.

Here's some steps that might be worth trying.

  1. Assuming Kubernetes is essentially agnostic to container type, then it seems like just creating an image for use in Kubernetes (via the KubernetesProcessProxy) should be doable. I would focus on creating your own custom image.
  2. Assuming this will be python based, I would then replace the image tag in the python_kubernetes/kernel.json file with the singularity tag.
  3. You may need to tweak the kernel-pod.yaml file in python_kubernetes/scripts to deal with this being a singularity image, but I would hope that isn't necessary.

For a pure Singularity integration (not using Kubernetes), then you'd need to create a SingularityProcessProxy plugin - which handles discovery and termination of the container. But I'm hoping we can get things started via K8s prior to that.

I'm happy to help answer questions and troubleshoot this effort, but I don't have the bandwidth to spend quality time on this right now. I apologize.

cceyda commented 4 years ago

I have been working on this on and off, made some progress but... What works:

BUT I cannot interrupt or restart kernels. I get the following in the logs: image

I assume this happens because the kernel is running on 127.0.0.1, and is assumed to be a local kernel by the kernel manager (based on this) and since it is a remote kernel it fails. Is this a valid assumption, or can it be some other underlying problem ? I guess what I'm asking is would chaging the ip to something else work? ( I was avoiding having to meddle with bridge networks etc)

Future plans: For supporting actually launching remote containers I'm leaning towards a dynamic kernelspec approach (maybe similar to this ) where the user can select the remote host they want to launch along with the container image they want.

kevin-bates commented 4 years ago

Thanks for the update - this is really exciting!!

I'd need to see your process proxy implementation, but interrupts and (hard) shutdowns (used on restarts) go through the gateway socket that is returned with the connection information at the kernel startup.

I suspect your process proxy's discovery mechanism is not replacing the IP returned to the gateway (which is probably 0.0.0.0) with the actual host IP for where it landed. This is called by the confirm_remote_startup() method.

If you don't mind pushing to a fork, I'd be happy to add another pair of eyes.

Regarding future plans, you really want parameterized kernels (which aren't currently available). We can talk about that more (including alternatives) once you get the kernel fully working.

cceyda commented 4 years ago

After a lot of digging I figured out why the interrupts weren't working. I wasn't returning anything in launch_process :woman_facepalming: (...well at least now I know better how the gateway works)

But now I'm facing the same problem as #756 on restarts. Restarts work on /tree but not on /lab. I'm using jupyterhub-singleuser to start the server. I will look into it some more hopefully within this week.

And yes! parameterized kernels would be great

kevin-bates commented 4 years ago

Hi @snu-ceyda - yes, the leaf-most launch_process implementations return the process proxy instance against which lifecycle actions are invoked. Glad to see you progressing.

This lab restart thing is troublesome - especially since there isn't much to go on here and local kernels work fine. I suspect there's some assumption about locality somewhere in lab but have no idea how to go about debugging it - nor am I finding any spare time these days.

I would focus on getting your Singularity support working solidly against Notebook for the time being.

cceyda commented 4 years ago

I have gotten this to work a while ago, and have been using it for a while on my setup. I just have to add/merge the latest changes involving async kernel support https://github.com/jupyter/enterprise_gateway/pull/794 etc, do some documentation explaining some of the implementation decisions I made, then I will push the changes on my fork. I wish I could have gotten this done a while before #jupytercon but didn't have the time. But in case anyone else is interested in this (?), know that it is coming (in a month maybe)

kevin-bates commented 4 years ago

This is great news and would be fantastic to get into our 3.0 release! I'm going ahead and assigning this issue to you. Thank you!