jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
623 stars 222 forks source link

JupyterHub integration with Jupyter Enterprise Gateway #448

Closed amangarg96 closed 6 years ago

amangarg96 commented 6 years ago

I read the following doc, which mentions about integrating JupyterHub with Jupyter Enterprise Gateway.

Is the Enterprise Gateway team working on it? When is it expected to be released?

kevin-bates commented 6 years ago

Thanks for your inquiry. Yes, a couple folks are working on a Hub integration - focused on Kubernetes. I would expect initial support in a week or two. If you wanted a bare metal integration, I'm sure the solution will likely apply to that as well since this is primarily a configuration exercise.

Regarding release specifics, we don't plan on publishing our K8s support to a release until Q4 - probably closer to the end than the beginning. :smile:

amangarg96 commented 6 years ago

Will it have a single Hub which manages all the Notebook servers? Or will there be a provision to have multiple Hubs, which can spawn Notebook servers on multiple nodes (asking from scalability point of view)

kevin-bates commented 6 years ago

The architecture would be that the Hub spawns NB2KG-enabled notebook servers (just like today, except that the notebook servers are not NB2KG-enabled). The NB2KG-enabled notebook servers will be configured with a KG_URL pointing at an Enterprise Gateway instance. EG, based on its configured kernels, will utilize the underlying resource manager to distribute kernels created on behalf of the user across the cluster on which EG is associated.

I believe your question is more related to Hub than EG. If you're using Kubernetes, then Hub will spawn pod-based Notebook servers, so that will get some portion of the distribution you desire. However, w/o EG, each notebook server would be launching local kernels and, potentially, exceed the capacity of the Notebook pod. Whereas with EG, each kernel is launched in its own pod.

I hope that helps.

lresende commented 6 years ago

For the ones eager to get started with JupyterHub and Enterprise Gateway, here is a draft blog for review. I am planning to publish it after I merge the nb2kg and nb2kg-hub. There is also some ansible-scripts to help setup the JupyterHub and EG environment.

amangarg96 commented 6 years ago

@kevin-bates

Isn't this something which can be done by making just configurable changes to the Notebook server and Enterprise Gateway?

We could enable the NB2KG serverextension and the NB2KG's SessionManager, RemoteKernelManager, RemoteKernelSpecManager.

Then set the Enterprise Gateway's URL export KG_URL=http://<Enterprise-Gateway-url>

(If running on YARN cluster) Set the SPARK_HOME, EG_YARN_ENDPOINT Configure the kernel.json and run.sh for running kernels on the cluster

Finally, launch JupyterHub. The default spawner class would launch the NB2KG enabled Notebook servers, which through NB2KG would launch the kernels on the Spark cluster.

Let me know if my understanding at any point is wrong

kevin-bates commented 6 years ago

Yes - exactly. We're focused on kubernetes because that's where most of the interest is. This should really just be a configuration exercise. The other thing to ensure is that JUPYTERHUB_USER gets mapped to KERNEL_USERNAME at the point the Notebook server is launched. This value will then flow to EG via NB2KG.

cc: @SolarisYan since they've also been spending time in this very area.

amangarg96 commented 6 years ago

@kevin-bates @lresende Any updates on this?

kevin-bates commented 6 years ago

@amangarg96 - sorry the for the delayed response. I don't have any further updates as I've been busy looking at docker swarm and Luciano has been traveling.

Although the blog draft posted by @lresende above is geared toward Kubernetes, the information contained is applicable to non-K8s envs as well since, as you point out, this is largely a configuration exercise. Therefore, I would suggest you try to equate the k8s concepts to your environment and see how things go.

I also believe @SolarisYan is spending time with Hub (see comments in #424) but that too is a k8s env. At any rate, we'd be happy to assist you in your efforts and your experience may provide helpful data-points for generalizing our ultimate solution - thanks!

lresende commented 6 years ago

The enhanced NB2KG image has been merged as part of #458, most of the rest is JupyterHub configuration to use this image.