jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
616 stars 223 forks source link

EC2 instance to EMR connection #943

Open ggittu opened 3 years ago

ggittu commented 3 years ago

We want to experiment the possibility of using the enterprise_gateway for our current setup. We have setup JupyterHub (TLJH) within an EC2 instance. From this TLJH we would like to connect to an EMR spark cluster(Livy enabled).

Can enterprise gateway be leveraged for this scenario? I cant see any blogs which explore this common use case.

kevin-bates commented 3 years ago

I'm not familiar with EMR, but you'd probably want to implement a process proxy specific for that resource manager. The YarnClusterProcessProxy might be a good reference.

lresende commented 3 years ago

EMR is the Managed Spark from AWS, it seems to use YARN under the covers so our YarnClusterProcessProxy should work with any necessary tweaks to the environment configuration:

https://aws.amazon.com/blogs/big-data/submitting-user-applications-with-spark-submit/

kevin-bates commented 3 years ago

Right on - thanks @lresende. Yeah, so as long as EMR uses the Hadoop Yarn REST API, which various searches indicate to be the case, EMR "should just work".

kevin-bates commented 2 years ago

Hi @ggittu - did you get anywhere with this using the YarnClusterProcessProxy or building one yourself?