Closed SevenMilk closed 4 years ago
Hi @SevenMilk - thanks for opening the issue. There are a number of inconsistencies with what you state and what you provide so I suspect we'll need a few iterations here.
The logs show your kernelspec using LocalProcessProxy
when the issue shows the correct YarnClusterProcessProxy
in use:
[D 2020-10-14 16:46:46.744 EnterpriseGatewayApp] Instantiating kernel 'Spark' with process proxy: enterprise_gateway.services.processproxies.processproxy.LocalProcessProxy
As a result, I'll need to see the logs relative to YarnClusterProcessProxy
. Since LocalProcessProxy
is not a RemoteProcessProxy
, the response address template parameter ({response_address}
) was left unfilled so the kernel launcher had no way to know where to send its connection information. I suspect the launcher crashed due to this missing parameter.
You don't need any of the following env entries in your kernel.json file:
"EG_YARN_ENDPOINT": "http://172.24.0.12:8088/ws/v1/cluster",
"EG_IMPERSONATION_ENABLED": "True",
"EG_YARN_LOG_LEVEL" : "DEBUG",
"EG_KERNEL_LAUNCH_TIMEOUT": "40",
"KERNEL_LAUNCH_TIMEOUT" : "60",
KERNEL_LAUNCH_TIMEOUT
must come from the client-side and is used by EG to determine when to give up waiting for a response from the remote kernel running in the YARN cluster. Because you're configuration wasn't right, we can hold off increasing it for now, but I suspect we'll probably need this increased since you need to load a large file with your kernel.
I don't any evidence of automatic restarts happening. The initial start is timing out due to the configuration issues.
What is the reason for restarting the spark cluster kernel every time?
You should be using notebook with the --gateway-url
option since nb2kg
has been in notebook since the 6.0 release. Use something like the following (where 172.24.0.216:8888
is the IP and port that EG is running on).
jupyter notebook --gateway-url=http://172.24.0.216:8888
None of the class mappings are necessary.
Although your kernel name is 'spark', you're referencing files from the out-of-the-box kernelspec examples (kernels/spark_python_yarn_cluster/bin/run.sh
). I would recommend leaving those alone and creating kernelspec hierarchies (bin and scripts sub-folders) for each kernelspec so that they can be individually tuned.
That's enough for now. Please send a new set of logs once these issues have been addressed. If you still have issues, you will need to look at the stdout/stderr logs via the YARN tools. These will contain the output produced by the launcher - which may have some issues with the environment of the node on which it lands since you can't install anaconda there.
Hi @kevin-bates . Thanks for your response !!
My notebook version is 5.7.0. To avoid other interference, I decide to create another env (name=cluster) install jupyter latest version, and it look like good idea.
jupyter-client 6.1.7
jupyter-contrib-core 0.3.3
jupyter-contrib-nbextensions 0.5.1
jupyter-core 4.6.3
jupyter-enterprise-gateway 2.3.0
jupyter-highlight-selected-word 0.2.0
jupyter-kernel-gateway 2.4.3
jupyter-latex-envs 1.4.6
jupyter-nbextensions-configurator 0.4.1
jupyterlab 2.2.8
jupyterlab-server 1.2.0
notebook 6.1.4
tornado 6.0.4
yarn-api-client 1.0.2
{
"language": "python",
"display_name": "Spark - Python (YARN Cluster Mode)",
"metadata": {
"process_proxy": {
"class_name": "enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy"
}
},
"env": {
"SPARK_HOME": "/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7",
"SPARK_CONF_DIR": "/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/conf",
"HADOOP_HOME": "/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3",
"HADOOP_CONF_DIR": "/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3/etc/hadoop/",
"PROG_HOME": "/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster",
"PYSPARK_PYTHON": "/usr/bin/python3",
"PYTHONPATH": "/home/ericjiang/miniconda3/envs/cluster/lib/python3.5/site-packages:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip",
"SPARK_OPTS": "--master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:$PATH",
"LAUNCH_OPTS": ""
},
"argv": [
"/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh",
"--RemoteProcessProxy.kernel-id",
"{kernel_id}",
"--RemoteProcessProxy.response-address",
"172.24.0.216:8888",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
jupyter notebook --gateway-url=http://172.24.0.216:8888
jupyter enterprisegateway --ip=172.24.0.216 --port_retries=0 --debug
c.EnterpriseGatewayApp.yarn_endpoint = 'http://172.24.0.12:8088/cluster'
I solve most of the problems: No.1, No.2, No.4, No.5 About No.6, I understand your thought, but for now I don't have any idea to DIY. If it doesn't cause a running error, I don't want to change it at this moment.
At now, I have a new question, this is my EG log:
[D 2020-10-15 13:37:58.446 EnterpriseGatewayApp] RemoteMappingKernelManager.start_kernel: spark_python_yarn_cluster, kernel_username: ericjiang
[D 2020-10-15 13:37:58.475 EnterpriseGatewayApp] Instantiating kernel 'Spark - Python (YARN Cluster Mode)' with process proxy: enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy
[D 2020-10-15 13:37:58.624 EnterpriseGatewayApp] Response socket launched on '172.24.0.216:60462' using 5.0s timeout
[D 2020-10-15 13:37:58.692 EnterpriseGatewayApp] YarnClusterProcessProxy shutdown wait time adjusted to 15.0 seconds.
[D 2020-10-15 13:37:58.693 EnterpriseGatewayApp] Starting kernel (async): ['/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '--RemoteProcessProxy.kernel-id', 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', '--RemoteProcessProxy.response-address', '172.24.0.216:8888', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-15 13:37:58.693 EnterpriseGatewayApp] Launching kernel: 'Spark - Python (YARN Cluster Mode)' with command: ['/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '--RemoteProcessProxy.kernel-id', 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', '--RemoteProcessProxy.response-address', '172.24.0.216:8888', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-15 13:37:58.693 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'KG_REQUEST_TIME': '120', 'EG_IMPERSONATION_ENABLED': 'False', 'EG_MAX_PORT_RANGE_RETRIES': '5', 'PYTHONPATH': '/home/ericjiang/miniconda3/envs/cluster/lib/python3.5/site-packages:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip', 'KERNEL_GATEWAY': '1', 'SPARK_OPTS': '--master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin', 'HADOOP_HOME': '/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3', 'KERNEL_LANGUAGE': 'python', 'KERNEL_ID': 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', 'HADOOP_CONF_DIR': '/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3/etc/hadoop/', 'LAUNCH_OPTS': '', 'KERNEL_USERNAME': 'ericjiang', 'EG_MIN_PORT_RANGE_SIZE': '1000', 'KERNEL_WORKING_DIR': '/home/ericjiang', 'SPARK_HOME': '/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7', 'SPARK_CONF_DIR': '/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/conf', 'PYSPARK_PYTHON': '/usr/local/bin/ipython3', 'PROG_HOME': '/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster', 'PATH': '/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin', 'KERNEL_LAUNCH_TIMEOUT': '40'}
[D 2020-10-15 13:37:58.697 EnterpriseGatewayApp] Yarn cluster kernel launched using YARN RM address: http://172.24.0.12:8088, pid: 26955, Kernel ID: aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b, cmd: '['/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '--RemoteProcessProxy.kernel-id', 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', '--RemoteProcessProxy.response-address', '172.24.0.216:8888', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']'
Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user ericjiang
+ eval exec /home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/bin/spark-submit '--master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin' '' /home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' --RemoteProcessProxy.kernel-id aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b --RemoteProcessProxy.response-address 172.24.0.216:8888 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
++ exec /home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/bin/spark-submit --master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin /home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py --RemoteProcessProxy.kernel-id aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b --RemoteProcessProxy.response-address 172.24.0.216:8888 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
[D 2020-10-15 13:37:58.702 EnterpriseGatewayApp] Serving kernel resource from: /home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster
[I 201015 13:37:58 web:2250] 200 GET /kernelspecs/spark_python_yarn_cluster/logo-64x64.png (172.24.0.216) 5.05ms
[D 2020-10-15 13:37:59.220 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:37:59.736 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:00.255 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[D 2020-10-15 13:38:00.771 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:01.285 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:01 INFO RMProxy: Connecting to ResourceManager at hadoop2/172.24.0.12:8032
20/10/15 13:38:01 INFO Client: Requesting a new application from cluster with 50 NodeManagers
20/10/15 13:38:01 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (24576 MB per container)
20/10/15 13:38:01 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/10/15 13:38:01 INFO Client: Setting up container launch context for our AM
20/10/15 13:38:01 INFO Client: Setting up the launch environment for our AM container
20/10/15 13:38:01 INFO Client: Preparing resources for our AM container
[D 2020-10-15 13:38:01.801 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[D 2020-10-15 13:38:02.316 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:02.835 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:03.353 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:03.869 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:04.385 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:04.900 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:05 INFO Client: Uploading resource file:/home/webuser/tmp/spark-b95873a4-4151-4277-947b-f3c6f9b8cbb9/__spark_libs__8498809013287075042.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/__spark_libs__8498809013287075042.zip
[D 2020-10-15 13:38:05.415 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:05.932 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:06.451 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:06.967 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:07.484 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:08.000 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:08 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster.zip#cluster -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/cluster.zip
[D 2020-10-15 13:38:08.521 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:09.041 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:09.560 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:10.078 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:10.596 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:11.117 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:11.634 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:12.154 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:12.671 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:13.191 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:13.709 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:14.249 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:14.763 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:15.283 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:15.800 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/launch_ipykernel.py
[D 2020-10-15 13:38:16.316 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/pyspark.zip
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/py4j-0.10.7-src.zip
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/webuser/tmp/spark-b95873a4-4151-4277-947b-f3c6f9b8cbb9/__spark_conf__4950530639879277944.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/__spark_conf__.zip
[D 2020-10-15 13:38:16.833 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:16 INFO SecurityManager: Changing view acls to: ericjiang
20/10/15 13:38:16 INFO SecurityManager: Changing modify acls to: ericjiang
20/10/15 13:38:16 INFO SecurityManager: Changing view acls groups to:
20/10/15 13:38:16 INFO SecurityManager: Changing modify acls groups to:
20/10/15 13:38:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ericjiang); groups with view permissions: Set(); users with modify permissions: Set(ericjiang); groups with modify permissions: Set()
[D 2020-10-15 13:38:17.350 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:17.865 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:18.386 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:18 INFO Client: Submitting application application_1600149154303_103906 to ResourceManager
20/10/15 13:38:18 INFO YarnClientImpl: Submitted application application_1600149154303_103906
20/10/15 13:38:18 INFO Client: Application report for application_1600149154303_103906 (state: ACCEPTED)
20/10/15 13:38:18 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1602740298698
final status: UNDEFINED
tracking URL: http://hadoop2:9046/proxy/application_1600149154303_103906/
user: ericjiang
20/10/15 13:38:18 INFO ShutdownHookManager: Shutdown hook called
20/10/15 13:38:18 INFO ShutdownHookManager: Deleting directory /home/webuser/tmp/spark-0060dcff-9b88-4857-8518-9e56fc6c5a10
20/10/15 13:38:18 INFO ShutdownHookManager: Deleting directory /home/webuser/tmp/spark-b95873a4-4151-4277-947b-f3c6f9b8cbb9
[D 2020-10-15 13:38:18.903 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:19.420 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:19.938 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:20.453 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:20.971 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:21.489 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:22.005 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:22.520 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:23.039 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:23.556 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:24.073 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:24.592 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:25.112 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:25.631 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:26.147 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:26.663 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:27.180 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:27.695 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:28.213 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:28.735 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:29.253 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:29.771 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:30.290 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:30.808 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:31.324 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:31.840 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:32.355 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:32.871 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:33.389 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:33.903 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:34.420 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:34.936 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:35.458 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:35.975 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:36.492 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:37.009 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:37.524 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:38.039 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:38.555 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.074 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.088 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.088 EnterpriseGatewayApp] BaseProcessProxy.terminate(): None
[D 2020-10-15 13:38:39.098 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.098 EnterpriseGatewayApp] YarnClusterProcessProxy.kill, application ID: None, kernel ID: aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b, state: None, result: None
[D 2020-10-15 13:38:39.099 EnterpriseGatewayApp] response socket still open, close it
[E 2020-10-15 13:38:39.099 EnterpriseGatewayApp] KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' launch timeout due to: Application ID is None. Failed to submit a new application to YARN within 40.0 seconds. Check Enterprise Gateway log for more information.
[E 201015 13:38:39 web:2250] 500 POST /api/kernels (172.24.0.216) 40654.66ms
you can see this log
YarnClusterProcessProxy.kill, application ID: None, kernel ID: aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b, state: None, result: None
It's seen that it's not successful Is there any place I haven’t set it up rightly? or It's related with anaconda?
Otherwise, I found something interesting. When the kernel is timeout and shutdown, the YARN still have my program on RUNNING Applications list. Is this correct?
Attach RUNNING Applications stderr, stdout, and pip list
20/10/15 13:38:48 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@hadoop21:10941)
20/10/15 13:38:48 INFO YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
20/10/15 13:38:48 INFO YarnAllocator: Submitted 2 unlocalized container requests.
20/10/15 13:38:48 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
20/10/15 13:38:48 INFO AMRMClientImpl: Received new token for : hadoop34:14797
20/10/15 13:38:48 INFO YarnAllocator: Launching container container_1600149154303_103906_01_000002 on host hadoop34 for executor with ID 1
20/10/15 13:38:48 INFO YarnAllocator: Launching container container_1600149154303_103906_01_000003 on host hadoop34 for executor with ID 2
20/10/15 13:38:48 INFO YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: Opening proxy : hadoop34:14797
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: Opening proxy : hadoop34:14797
20/10/15 13:39:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.0.34:56716) with ID 1
20/10/15 13:39:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.0.34:56714) with ID 2
20/10/15 13:39:16 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
20/10/15 13:39:16 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
20/10/15 13:39:16 INFO BlockManagerMasterEndpoint: Registering block manager hadoop34:24362 with 434.4 MB RAM, BlockManagerId(1, hadoop34, 24362, None)
20/10/15 13:39:16 INFO BlockManagerMasterEndpoint: Registering block manager hadoop34:34861 with 434.4 MB RAM, BlockManagerId(2, hadoop34, 34861, None)
[D 2020-10-15 13:38:42,738.738 launch_ipykernel] Using connection file '/tmp/kernel-aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b_uaqjgxxd.json'.
[I 2020-10-15 13:38:42,739.739 launch_ipykernel] Signal socket bound to host: 0.0.0.0, port: 58354
[D 2020-10-15 13:38:42,739.739 launch_ipykernel] JSON Payload 'b'{"transport": "tcp", "key": "82174c3f-bb35-4322-8aa1-26fdd04cae7a", "ip": "0.0.0.0", "signature_scheme": "hmac-sha256", "comm_port": 58354, "kernel_name": "", "pgid": "11292", "iopub_port": 31872, "hb_port": 32353, "control_port": 64213, "shell_port": 49445, "pid": "11487", "stdin_port": 37770}'
[D 2020-10-15 13:38:42,743.743 launch_ipykernel] Encrypted Payload 'b'1ikkW7ADbU9d9w3qdNpPXIDNiTkmjHy7bE7O69z0sBcQWrpRiY90LCFHoUAqbS3k6ilVe3cfjL1hgK7r0Z2WRD4jlWVHh1nlSdYV6rfsD+R0dB1Ca9OabZN+bpcc8CigiW7d4flo2nCXpbSmjUzkyf2NJ6FUqPo5d1SpaxipVNJ58X22+qwffJZZAdNQrqoQfOER5+984fiKhckuviemk9LVxw+KrSF2k8nYtHlGBycCflMm5NZHqybbiaSUcsJrljvyxgq/cUcsKpJrTuIe1EtYem6rDr859kmj3Mnx9wEOGRlaZZpT0/woQSH/LUuB5jnQjQMWb5Vjk8JeOdHYF/eamDNKcluX1MK4d3rH/WnVKrnafxf8zzWbreoFPgQ5pgLTELdaRECOK3KsB6F9AQ=='
/mnt/hdfs3/nm/usercache/ericjiang/appcache/application_1600149154303_103906/container_1600149154303_103906_01_000001/cluster/lib/python3.5/site-packages/IPython/paths.py:68: UserWarning: IPython parent '/home' is not a writable location, using a temp directory.
" using a temp directory.".format(parent))
NOTE: When using the ipython kernel entry point, Ctrl-C will not work.
To exit, you will have to explicitly quit this process, by either sending
"quit" from a client, or using Ctrl-\ in UNIX-like environments.
To read more about this, see ipython/ipython: Issue #2049
To connect another client to this kernel, use:
--existing /tmp/kernel-aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b_uaqjgxxd.json
Package Version
--------------------------------- ------------
argon2-cffi 20.1.0
attrs 20.2.0
backcall 0.2.0
bcrypt 3.1.7
bleach 3.2.1
cachetools 4.1.1
certifi 2018.8.24
cffi 1.14.3
chardet 3.0.4
click 7.1.2
cloudpickle 1.6.0
confluent-kafka 1.5.0
cryptography 3.1.1
cycler 0.10.0
dask 2.6.0
decorator 4.4.2
defusedxml 0.6.0
distributed 2.6.0
docker 4.3.1
docopt 0.6.2
entrypoints 0.3
featuretools 0.13.4
findspark 1.4.2
future 0.18.2
google-auth 1.22.1
hdfs 2.5.8
HeapDict 1.0.1
idna 2.10
importlib-metadata 2.0.0
ipykernel 5.3.4
ipython 7.9.0
ipython-genutils 0.2.0
jedi 0.17.2
Jinja2 2.11.2
joblib 0.14.1
json5 0.9.5
jsonschema 3.2.0
jupyter-client 6.1.7
jupyter-contrib-core 0.3.3
jupyter-contrib-nbextensions 0.5.1
jupyter-core 4.6.3
jupyter-enterprise-gateway 2.3.0
jupyter-highlight-selected-word 0.2.0
jupyter-kernel-gateway 2.4.3
jupyter-latex-envs 1.4.6
jupyter-nbextensions-configurator 0.4.1
jupyterlab 2.2.8
jupyterlab-server 1.2.0
kiwisolver 1.1.0
kmodes 0.10.2
kubernetes 12.0.0
lesscpy 0.14.0
lightgbm 3.0.0
lxml 4.5.2
MarkupSafe 1.1.1
matplotlib 3.0.3
mistune 0.8.4
msgpack 0.6.2
nbconvert 5.6.1
nbformat 5.0.8
notebook 6.1.4
numpy 1.18.5
oauthlib 3.1.0
packaging 20.4
pandas 0.25.3
pandocfilters 1.4.2
paramiko 2.7.2
parso 0.7.1
pexpect 4.8.0
pickleshare 0.7.5
pip 10.0.1
ply 3.11
prometheus-client 0.8.0
prompt-toolkit 2.0.10
psutil 5.7.2
ptyprocess 0.6.0
py4j 0.10.9
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.20
pycryptodomex 3.9.8
PyEmail 0.0.1
Pygments 2.7.1
PyNaCl 1.4.0
pyparsing 2.4.7
pypinyin 0.39.0
pyrsistent 0.17.3
pyspark 3.0.0
python-dateutil 2.8.1
pytz 2020.1
PyYAML 5.3.1
pyzmq 19.0.2
requests 2.24.0
requests-oauthlib 1.3.0
rsa 4.6
scikit-learn 0.22.2.post1
scipy 1.4.1
seaborn 0.9.1
Send2Trash 1.5.0
setuptools 50.3.1
six 1.15.0
sortedcontainers 2.2.2
tblib 1.7.0
terminado 0.8.3
testpath 0.4.4
toolz 0.11.1
tornado 6.0.4
tqdm 4.49.0
traitlets 4.3.3
ua-parser 0.10.0
urllib3 1.25.9
user-agents 2.2.0
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.57.0
wheel 0.35.1
yapf 0.30.0
yarn-api-client 1.0.2
zict 2.0.0
zipp 1.2.0
To avoid other interference, I decide to create another env (name=cluster) install jupyter latest version, and it look like good idea.
Yeah, you want NB 6 and EG 2.2 so you can take advantage of the async kernel management - which is a big win for EG.
Regarding No. 6, I should have probably have stated this a different way. I suspect you created your "spark" kernelspec directory by copying the directory of spark_python_yarn_cluster
and there are path references that should technically be updated (i.e., change spark_python_yarn_cluster
to spark
) otherwise changes to spark/bin/run.sh
won't get picked up.
Regarding your latest issue, your YARN resource manager has yet to assign the kernel a host and KERNEL_LAUNCH_TIMEOUT
is being exceeded after 40 seconds. You should try to determine if your YARN cluster is saturated. Also, I would recommend extending your KERNEL_LAUNCH_TIMEOUT
to 120 on your client and restart your client so that it carries over. By the way, this is what I was getting at with item number 3.
Everything else looks good. EG will not attempt to communicate with the kernel until its been assigned a host and it knows the host - which it gets via the application state polling that eventually times out.
Hi @kevin-bates . Thanks for your help
Great! I am able to use YARN-cluster-mode.
I finally change the parameter of "--name"
in "SPARK_OPTS"
and "--RemoteProcessProxy.response-address"
so that the kernel can run normally.
At last, I write the installation process as a github document . I hope this documentation can be useful for more people when installing jupyter-enterprise-gateway.
Only the mandarin available now. The English version will be added later :)
Excellent - glad to hear you're moving forward. I will go ahead and close this issue.
Regarding the documentation, it would be great to see if you could incorporate any important changes into our existing docs.
Also, I see you're referencing Python 3.5 in your document. Would it be possible to bump that to Python 3.6 (at a minimum)? 3.5 was end-of-life last month and we will likely be dropping 3.5 support in our 3.0 release. Just a heads up. Thanks.
Hi, I want to use spark --deploy-mode=cluster on jupyter notebook, so I study EG and try to setup. My working environment is on anaconda python=3.5 version.
For some reasons, I cannot install anaconda to every node, but I found other solution, I use
--conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark
to zip the env upload to Yarn so the worker node has the same env.
After setting, run EG & notebook and import the spark cluster kernel. Next, jupyter notebook retry the spark cluster kernel repeatedly. This situation should not happened, and I have no idea what the is root cause of it.
I try to follow this web as following https://github.com/jupyter/enterprise_gateway/issues/600?fireglass_rsn=true to fix the problems. Unfortunately, it's still not working :(
Let me summarize my questions as below:
Attach my log, kernel.json file and conda list
RUN Command
Logs
kernel.json
conda list