Closed ramkrishnan8994 closed 4 years ago
@consideRatio @manics Any solutions to this?
Sounds like this issue is related to https://github.com/jupyterhub/kubespawner/issues/299
One solution would be to insert Jupyter Enterprise Gateway (EG) between your spawned Notebook servers and your kernels. EG would then launch your spark-based kernels in "cluster mode" via k8s-spark - with a pod dedicated to the driver and executors. It will also launch vanilla kernels across the k8s cluster as well - each kernel in its own pod.
See this blog post by @lresende for setting this up.
One solution would be to insert Jupyter Enterprise Gateway (EG) between your spawned Notebook servers and your kernels. EG would then launch your spark-based kernels in "cluster mode" via k8s-spark - with a pod dedicated to the driver and executors. It will also launch vanilla kernels across the k8s cluster as well - each kernel in its own pod.
See this blog post by @lresende for setting this up.
Thanks @kevin-bates . But we were looking for a simpler solution that does not involve adding more components.
Hi, if this we can be useful to someone thanks to point suggest from @consideRatio and @abinet in link in extraConfig: in hub definition and
adding echo "spark.driver.host $MY_POD_IP" >> "/usr/local/spark/conf/spark-defaults.conf";
in lifecycleHooks in singleuser definition the problem is solve. Can seen as ugly approached but at the moment of writing it works for me
Thank you everyone for helping out, I'm now closing this issue as there i identify no concrete action related to this github repository!
https://github.com/jupyterhub/kubespawner/pull/229
https://github.com/alagrede/jupyter-spark/blob/master/python3/spark-example.ipynb
This fixed it for me. Translate the driver hostname to an ip.
Hi, I'm using Jupyterhub v0.7 and using the helm charts to deploy the same. We have a Spark Cluster that is also running on the same kubernetes cluster as Jupyterhub. We use the 'all-spark-notebook' from Docker stacks for single user images.
I know that one of the requirement for Jupyter to run using Docker stacks is that the hostNetwork has to be set to True.
Now, if I set hostNetwork to True, I can't spawn more than 1 Jupyter user instance because the port 8888 has already been assigned to the first instance. The new instance fails due to port conflict on 8888.
Now, if I set hostNetwork to False, I am able to spawn multiple user instances, BUT, since we connect the jupyter notebooks to a remote Spark Master, the Spark Master/Cluster is not able to resolve the host of the User Jupyter instance (which is the driver for the application). This is the error in Spark Master:
Caused by: java.io.IOException: Failed to connect to jupyter-doe-xxxxx:39003 Caused by: java.net.UnknownHostException: jupyter-doe-xxxxx
jupyter-doe-xxxxx is the name of the pod that is spawned for the user doe.
This is a link to a similar issue faced in DockerSpawner: https://github.com/jupyter/docker-stacks/issues/187#issuecomment-212448091
How can we solve this issue? We want all the applications to connect to a Remote Spark Cluster