Closed sedatkestepe closed 6 years ago
Hello @sedatkestepe, thank you for the issue. There are a number of observations here, so I'm just going to enumerate them.
The general issue is due to password-less ssh not being configured. Since your kernelspec is configured for the DistributedProcessProxy
and since the localhost is being accessed, Enterprise Gateway still requires password-less ssh be configured, even in loopback operations.
Since you're using the DistributedProcessProxy
, you probably want to access other nodes of your cluster. These nodes must be specified with either env EG_REMOTE_HOSTS
or --EnterpriseGatewayApp.remote_hosts
when starting Enterprise Gateway. There are ways to configure the kernelspec itself to use its own remote hosts, but that's more of an advanced option. Please refer to the Enabling YARN Client Mode page in the docs. Password-less ssh must still be configured across whatever nodes are in use. Also, when configuring for remote hosts using DistributedProcessProxy
, any potential hosts must contain the same kernelspecs file hierarchy in the same location as on the Gateway server.
The kernelspec files provided in the distribution are samples. Make sure all paths are relative to your configuration. This isn't the issue you're seeing, but I just want to point this out.
You might find running in YARN Cluster Mode is easier to get working since it doesn't require password-less ssh or distribution of the kernelspec hierarchy. Just be sure to configure EG_YARN_ENDPOINT
or --EnterpriseGatewayApp.yarn_endpoint
. The important item here is that the application name contain the kernel's id - which can be referenced via KERNEL_ID
. The sample kernelspecs set name to KERNEL_ID
entirely - although the id just needs to be contained within the application name.
A good first step may be to simply launch the out-of-the-box python kernel (python2
or python3
) - that is not provided by our kernelspecs tar file. This will launch a python kernel local to Enterprise Gateway, essentially behaving like Jupyter Kernel Gateway.
I hope this helps.
Hello @kevin-bates ,
Thanks for your response.
Let me detail what I am trying to do: We have an Hortonworks distribution Hadoop installed cluster which has Spark 2.2 within the stack. Some requirements occurred recently for new features in Spark 2.3.0 and also cluster computing power. I didn't want to get into a full stack update of HDP. So I have placed a secondary Spark release (2.3.1) on our edge server. When I was searching for configurations which would let me submit Jupyter notebooks on our existing resource scheduler, Yarn (I also want to keep it as the only resource scheduler) I found Enterprise Gateway and NB2KG as well.
Afaik, to run Spark applications in Yarn cluster mode, our datanodes need 2.3.1 Spark binaries pre-installed, right? This is the reason for that I tried client mode.
At your number 4 you have given Yarn Cluster Mode document link. Document starts with the purpose of enabling Ipython kernel but the rest goes on with spark_python_yarn_cluster setup. Does it mean to run both Jupyter notebook (also NB2KG) and Enterprise Gateway on the same machine? If so, I couldn't make it happen even if I configured EG to 7777 and notebook to 8888. What should be installed and running on notebook server? I imagine Jupyter Notebook Server to be run on client (datascientist's PC). Could you fix me if there is something I misunderstand? Am I supposed to install both NB2KG and spark_python_yarn_cluster on the same host? (Some level of confusion)
For Yarn client mode do we need password-less ssh from EG to client or from EG to Yarn RM?
@sedatkestepe - thank you for the information.
Yes, all worker (data) nodes require the same Spark installation files. That's true even if you wanted to use YARN client mode as well.
Once you have your Spark installation working, we recommend that Enterprise Gateway be installed on the YARN master node and Notebook (w/ the NB2KG extensions) installed on your various clients (i.e., data scientists PCs). You would then set the KG_URL
env on those PCs to point at the YARN master node where EG resides. We also support co-located Notebook servers, but that's not the use case we're targeting since we'd prefer a bring your own notebook model.
If you wanted to fall back to YARN client mode operation using the DistributedProcessProxy
, then you'd need to configure password-less SSH across whatever nodes you wish to launch kernels on - typically these are the YARN nodes. These nodes/hosts are expresses via the EG_REMOTE_HOSTS
env or --EnterpriseGatewayApp.remote_hosts
command line option.
Closing due to lack of activity but hope that's due to a sufficient answer. If the answer was not sufficient, please re-open the issue along with what else you need. Thank you.
Hi,
I am using Jupyter Enterprise Gateway in Yarn Cluster Mode, and launching Jupyter Lab server on a remote Linux box.
I am facing the same 'HTTP 403: Failed to authenticate SSHClient with password-less SSH' issue, and here is the log for the same.
[I 2018-12-01 19:45:59.421 EnterpriseGatewayApp] Kernel shutdown: 0836005e-2a6b-4c5a-a825-f49bcdef5f30 [I 2018-12-01 19:49:56.613 EnterpriseGatewayApp] KernelRestarter: restarting kernel (1/5), keep random ports [W 2018-12-01 19:49:56.613 EnterpriseGatewayApp] Remote kernel (d90ff5df-2621-4a36-a0d3-1be2f8da1450) will not be automatically restarted since there are no clients connected at this time. [W 2018-12-01 19:49:56.618 EnterpriseGatewayApp] Termination of application 'application_1542713397395_139615' failed with exception: 'Response finished with status: 500. Details: {"RemoteException":{"exception":"WebApplicationException","message":"com.sun.jersey.api.MessageException: A message body reader for Java class org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppState, and Java type class org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppState, and MIME media type application/octet-stream was not found.\nThe registered message body readers compatible with the MIME media type are:\napplication/octet-stream ->\n com.sun.jersey.core.impl.provider.entity.ByteArrayProvider\n com.sun.jersey.core.impl.provider.entity.FileProvider\n com.sun.jersey.core.impl.provider.entity.InputStreamProvider\n com.sun.jersey.core.impl.provider.entity.DataSourceProvider\n com.sun.jersey.core.impl.provider.entity.RenderedImageProvider\n/ ->\n com.sun.jersey.core.impl.provider.entity.FormProvider\n com.sun.jersey.json.impl.provider.entity.JSONJAXBElementProvider$General\n com.sun.jersey.json.impl.provider.entity.JSONArrayProvider$General\n com.sun.jersey.json.impl.provider.entity.JSONObjectProvider$General\n com.sun.jersey.core.impl.provider.entity.StringProvider\n com.sun.jersey.core.impl.provider.entity.ByteArrayProvider\n com.sun.jersey.core.impl.provider.entity.FileProvider\n com.sun.jersey.core.impl.provider.entity.InputStreamProvider\n com.sun.jersey.core.impl.provider.entity.DataSourceProvider\n com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$General\n com.sun.jersey.core.impl.provider.entity.ReaderProvider\n com.sun.jersey.core.impl.provider.entity.DocumentProvider\n com.sun.jersey.core.impl.provider.entity.SourceProvider$StreamSourceReader\n com.sun.jersey.core.impl.provider.entity.SourceProvider$SAXSourceReader\n com.sun.jersey.core.impl.provider.entity.SourceProvider$DOMSourceReader\n com.sun.jersey.json.impl.provider.entity.JSONRootElementProvider$General\n com.sun.jersey.json.impl.provider.entity.JSONListElementProvider$General\n com.sun.jersey.json.impl.provider.entity.JacksonProviderProxy\n com.sun.jersey.core.impl.provider.entity.XMLRootElementProvider$General\n com.sun.jersey.core.impl.provider.entity.XMLListElementProvider$General\n com.sun.jersey.core.impl.provider.entity.XMLRootObjectProvider$General\n com.sun.jersey.core.impl.provider.entity.EntityHolderReader\n","javaClassName":"javax.ws.rs.WebApplicationException"}}'. Continuing... [E 2018-12-01 19:49:56.673 EnterpriseGatewayApp] Failed to authenticate SSHClient with password-less SSH [W 2018-12-01 19:49:56.673 EnterpriseGatewayApp] Remote signal(15) to '-32518' on host '
' failed with exception 'HTTP 403: Failed to authenticate SSHClient with password-less SSH'.
I believe that this has interfered with Kernel lifecycle, and hence I am getting 'orphan kernels'. Orphan kernels are the one for which the spark submit was done, the job goes into running state, but the notebook server does not interact with that kernel, so it does not get shutdown when notebook server is shutdown.
Hello, I wanted to evaluate Jupyter Enterprise Gateway on our edge node which has Hadoop (ecosystem) client binaries. The trouble I am living is as I open new Spark - Python (YARN Client Mode) notebook I receive 500 in client notebook logs from EG and receive 403 in EG logs but the source is kind of ambiguous in logging even though EG runs in debug mode.
Enterprise Gateway logs are below. What could be the source of 403? Thanks in advance
PS: I have an additional Spark installation (2.3.1 which I am planning to use after initial successful run) in PATH but it doesn't seems to be the problem since it is not included in appended CMD string.