jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
621 stars 221 forks source link

Failed to start jupyter enterprise gateway Yarn cluster mode #895

Closed SevenMilk closed 4 years ago

SevenMilk commented 4 years ago

Hi, I want to use spark --deploy-mode=cluster on jupyter notebook, so I study EG and try to setup. My working environment is on anaconda python=3.5 version.

For some reasons, I cannot install anaconda to every node, but I found other solution, I use

--conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark

to zip the env upload to Yarn so the worker node has the same env.

After setting, run EG & notebook and import the spark cluster kernel. Next, jupyter notebook retry the spark cluster kernel repeatedly. This situation should not happened, and I have no idea what the is root cause of it.

I try to follow this web as following https://github.com/jupyter/enterprise_gateway/issues/600?fireglass_rsn=true to fix the problems. Unfortunately, it's still not working :(

Let me summarize my questions as below:

  1. What is the reason for restarting the spark cluster kernel every time?
  2. I follow the web https://github.com/jupyter/enterprise_gateway/issues/600?fireglass_rsn=true, which tells me that I cannot use python3? I must to change env python3.5 to python2.7?

Attach my log, kernel.json file and conda list

RUN Command

jupyter notebook   --NotebookApp.session_manager_class=nb2kg.managers.SessionManager   --NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager   --NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager
jupyter enterprisegateway --ip=172.24.0.216 --port_retries=0 --debug --RemoteMappingKernelManager.kernel_info_timeout=120 --EnterpriseGatewayApp.yarn_endpoint=http://172.24.0.12:8088

Logs

[D 2020-10-14 16:46:46.510 EnterpriseGatewayApp] Found kernel apache_toree_scala in /home/ericjiang/.local/share/jupyter/kernels
[D 2020-10-14 16:46:46.510 EnterpriseGatewayApp] Found kernel spark_python_yarn_cluster in /home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels
[D 2020-10-14 16:46:46.510 EnterpriseGatewayApp] Found kernel python3 in /home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels
[I 201014 16:46:46 web:2162] 200 GET /api/kernelspecs (172.24.0.216) 3.03ms
[D 2020-10-14 16:46:46.733 EnterpriseGatewayApp] RemoteMappingKernelManager.start_kernel: spark_python_yarn_cluster, kernel_username: ericjiang
[D 2020-10-14 16:46:46.744 EnterpriseGatewayApp] Instantiating kernel 'Spark' with process proxy: enterprise_gateway.services.processproxies.processproxy.LocalProcessProxy
[D 2020-10-14 16:46:46.748 EnterpriseGatewayApp] Starting kernel: ['/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json', '--RemoteProcessProxy.response-address', '{response_address}', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-14 16:46:46.749 EnterpriseGatewayApp] Launching kernel: Spark with command: ['/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json', '--RemoteProcessProxy.response-address', '{response_address}', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-14 16:46:46.750 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'SPARK_OPTS': '--master yarn --deploy-mode cluster --name Spark-cluster --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=spark --conf spark.yarn.appMasterEnv.PYTHONPATH=spark/lib/python3.5/site-packages:spark/spark-2.4.3-bin-hadoop2.7/python:spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=spark/bin:$PATH', 'HADOOP_HOME': '/home/ericjiang/miniconda3/envs/spark/hadoop-2.7.3', 'EG_YARN_LOG_LEVEL': 'DEBUG', 'PYTHONPATH': '/home/ericjiang/miniconda3/envs/spark/lib/python3.5/site-packages:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip', 'HADOOP_CONF_DIR': '/home/ericjiang/miniconda3/envs/spark/hadoop-2.7.3/etc/hadoop/', 'EG_IMPERSONATION_ENABLED': 'False', 'SPARK_CONF_DIR': '/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/conf', 'PATH': '/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/spark/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin', 'KERNEL_ID': '4211d8bf-83fe-4583-bb75-8961ca0ceba9', 'PROG_HOME': '/home/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster', 'SPARK_HOME': '/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7', 'KERNEL_LAUNCH_TIMEOUT': '60', 'KERNEL_WORKING_DIR': '/home/ericjiang', 'PYSPARK_PYTHON': '/usr/bin/python', 'KERNEL_GATEWAY': '1', 'EG_KERNEL_LAUNCH_TIMEOUT': '40', 'EG_YARN_ENDPOINT': 'http://172.24.0.12:8088/ws/v1/cluster', 'KERNEL_USERNAME': 'ericjiang', 'LAUNCH_OPTS': ''}
[I 2020-10-14 16:46:46.753 EnterpriseGatewayApp] Local kernel launched on '172.24.0.216', pid: 15531, pgid: 15531, KernelID: 4211d8bf-83fe-4583-bb75-8961ca0ceba9, cmd: '['/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json', '--RemoteProcessProxy.response-address', '{response_address}', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']'

Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user ericjiang

[D 2020-10-14 16:46:46.755 EnterpriseGatewayApp] Connecting to: tcp://127.0.0.1:57311
[D 2020-10-14 16:46:46.757 EnterpriseGatewayApp] Connecting to: tcp://127.0.0.1:34642
+ eval exec /home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/bin/spark-submit '--master yarn --deploy-mode cluster --name Spark-cluster --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=spark --conf spark.yarn.appMasterEnv.PYTHONPATH=spark/lib/python3.5/site-packages:spark/spark-2.4.3-bin-hadoop2.7/python:spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=spark/bin:$PATH' '' /home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' /home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json --RemoteProcessProxy.response-address '{response_address}' --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
++ exec /home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/bin/spark-submit --master yarn --deploy-mode cluster --name Spark-cluster --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=spark --conf spark.yarn.appMasterEnv.PYTHONPATH=spark/lib/python3.5/site-packages:spark/spark-2.4.3-bin-hadoop2.7/python:spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=spark/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/spark/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin /home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py /home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json --RemoteProcessProxy.response-address '{response_address}' --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
[I 2020-10-14 16:46:46.759 EnterpriseGatewayApp] Kernel started: 4211d8bf-83fe-4583-bb75-8961ca0ceba9
[D 2020-10-14 16:46:46.759 EnterpriseGatewayApp] Kernel args: {'env': {'KERNEL_LAUNCH_TIMEOUT': '40', 'KERNEL_WORKING_DIR': '/home/ericjiang', 'KERNEL_USERNAME': 'ericjiang', 'PATH': '/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/spark/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin'}, 'kernel_name': 'spark_python_yarn_cluster'}
[I 201014 16:46:46 web:2162] 201 POST /api/kernels (172.24.0.216) 29.19ms
[W 201014 16:46:46 web:2162] 404 GET /kernelspecs/spark_python_yarn_cluster/logo-64x64.png (172.24.0.216) 0.92ms
[I 201014 16:46:46 web:2162] 200 GET /api/kernels/4211d8bf-83fe-4583-bb75-8961ca0ceba9 (172.24.0.216) 0.96ms
[D 2020-10-14 16:46:46.917 EnterpriseGatewayApp] Initializing websocket connection /api/kernels/4211d8bf-83fe-4583-bb75-8961ca0ceba9/channels
[W 2020-10-14 16:46:46.922 EnterpriseGatewayApp] No session ID specified
[D 2020-10-14 16:46:46.922 EnterpriseGatewayApp] Requesting kernel info from 4211d8bf-83fe-4583-bb75-8961ca0ceba9
[D 2020-10-14 16:46:46.923 EnterpriseGatewayApp] Connecting to: tcp://127.0.0.1:33253
20/10/14 16:46:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/10/14 16:46:49 INFO RMProxy: Connecting to ResourceManager at hadoop2/172.24.0.12:8032
20/10/14 16:46:50 INFO Client: Requesting a new application from cluster with 50 NodeManagers
20/10/14 16:46:50 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (24576 MB per container)
20/10/14 16:46:50 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/10/14 16:46:50 INFO Client: Setting up container launch context for our AM
20/10/14 16:46:50 INFO Client: Setting up the launch environment for our AM container
20/10/14 16:46:50 INFO Client: Preparing resources for our AM container
20/10/14 16:46:50 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/10/14 16:46:53 INFO Client: Uploading resource file:/home/webuser/tmp/spark-e7150e81-04b5-4ac9-a4d1-0744bbc7bd97/__spark_libs__4690409181217442986.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_100861/__spark_libs__4690409181217442986.zip
20/10/14 16:46:57 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/spark_env.zip#spark -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_100861/spark_env.zip
[I 201014 16:47:00 web:2162] 200 GET /api/kernels/4211d8bf-83fe-4583-bb75-8961ca0ceba9 (172.24.0.216) 0.78ms
20/10/14 16:47:15 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_100861/launch_ipykernel.py
20/10/14 16:47:15 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_100861/pyspark.zip
20/10/14 16:47:15 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_100861/py4j-0.10.7-src.zip
20/10/14 16:47:15 INFO Client: Uploading resource file:/home/webuser/tmp/spark-e7150e81-04b5-4ac9-a4d1-0744bbc7bd97/__spark_conf__1431165475248083138.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_100861/__spark_conf__.zip
20/10/14 16:47:16 INFO SecurityManager: Changing view acls to: ericjiang
20/10/14 16:47:16 INFO SecurityManager: Changing modify acls to: ericjiang
20/10/14 16:47:16 INFO SecurityManager: Changing view acls groups to:
20/10/14 16:47:16 INFO SecurityManager: Changing modify acls groups to:
20/10/14 16:47:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ericjiang); groups with view permissions: Set(); users  with modify permissions: Set(ericjiang); groups with modify permissions: Set()
20/10/14 16:47:17 INFO Client: Submitting application application_1600149154303_100861 to ResourceManager
20/10/14 16:47:17 INFO YarnClientImpl: Submitted application application_1600149154303_100861
20/10/14 16:47:17 INFO Client: Application report for application_1600149154303_100861 (state: ACCEPTED)
20/10/14 16:47:17 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.default
         start time: 1602665237951
         final status: UNDEFINED
         tracking URL: http://hadoop2:9046/proxy/application_1600149154303_100861/
         user: ericjiang
20/10/14 16:47:17 INFO ShutdownHookManager: Shutdown hook called
20/10/14 16:47:17 INFO ShutdownHookManager: Deleting directory /home/webuser/tmp/spark-e7150e81-04b5-4ac9-a4d1-0744bbc7bd97
20/10/14 16:47:18 INFO ShutdownHookManager: Deleting directory /home/webuser/tmp/spark-26875efd-3930-4403-bab1-2373bfbaefde
[I 2020-10-14 16:47:19.759 EnterpriseGatewayApp] KernelRestarter: restarting kernel (1/5), keep random ports
[D 2020-10-14 16:47:19.759 EnterpriseGatewayApp] RemoteKernelManager.signal_kernel(9)
[D 2020-10-14 16:47:19.760 EnterpriseGatewayApp] Instantiating kernel 'Spark' with process proxy: enterprise_gateway.services.processproxies.processproxy.LocalProcessProxy
[D 2020-10-14 16:47:19.760 EnterpriseGatewayApp] Starting kernel: ['/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json', '--RemoteProcessProxy.response-address', '{response_address}', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-14 16:47:19.761 EnterpriseGatewayApp] Launching kernel: Spark with command: ['/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json', '--RemoteProcessProxy.response-address', '{response_address}', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-14 16:47:19.761 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'SPARK_OPTS': '--master yarn --deploy-mode cluster --name Spark-cluster --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=spark --conf spark.yarn.appMasterEnv.PYTHONPATH=spark/lib/python3.5/site-packages:spark/spark-2.4.3-bin-hadoop2.7/python:spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=spark/bin:$PATH', 'HADOOP_HOME': '/home/ericjiang/miniconda3/envs/spark/hadoop-2.7.3', 'EG_YARN_LOG_LEVEL': 'DEBUG', 'PYTHONPATH': '/home/ericjiang/miniconda3/envs/spark/lib/python3.5/site-packages:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip', 'HADOOP_CONF_DIR': '/home/ericjiang/miniconda3/envs/spark/hadoop-2.7.3/etc/hadoop/', 'EG_IMPERSONATION_ENABLED': 'False', 'SPARK_CONF_DIR': '/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/conf', 'PATH': '/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/spark/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin', 'KERNEL_ID': '4211d8bf-83fe-4583-bb75-8961ca0ceba9', 'PROG_HOME': '/home/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster', 'SPARK_HOME': '/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7', 'KERNEL_LAUNCH_TIMEOUT': '60', 'KERNEL_WORKING_DIR': '/home/ericjiang', 'PYSPARK_PYTHON': '/usr/bin/python', 'KERNEL_GATEWAY': '1', 'EG_KERNEL_LAUNCH_TIMEOUT': '40', 'EG_YARN_ENDPOINT': 'http://172.24.0.12:8088/ws/v1/cluster', 'KERNEL_USERNAME': 'ericjiang', 'LAUNCH_OPTS': ''}
[I 2020-10-14 16:47:19.764 EnterpriseGatewayApp] Local kernel launched on '172.24.0.216', pid: 15853, pgid: 15853, KernelID: 4211d8bf-83fe-4583-bb75-8961ca0ceba9, cmd: '['/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json', '--RemoteProcessProxy.response-address', '{response_address}', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']'
[D 2020-10-14 16:47:19.765 EnterpriseGatewayApp] Connecting to: tcp://127.0.0.1:57311
[D 2020-10-14 16:47:19.766 EnterpriseGatewayApp] Refreshing kernel session for id: 4211d8bf-83fe-4583-bb75-8961ca0ceba9

Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user ericjiang

+ eval exec /home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/bin/spark-submit '--master yarn --deploy-mode cluster --name Spark-cluster --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=spark --conf spark.yarn.appMasterEnv.PYTHONPATH=spark/lib/python3.5/site-packages:spark/spark-2.4.3-bin-hadoop2.7/python:spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=spark/bin:$PATH' '' /home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' /home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json --RemoteProcessProxy.response-address '{response_address}' --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
++ exec /home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/bin/spark-submit --master yarn --deploy-mode cluster --name Spark-cluster --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=spark --conf spark.yarn.appMasterEnv.PYTHONPATH=spark/lib/python3.5/site-packages:spark/spark-2.4.3-bin-hadoop2.7/python:spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=spark/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/spark/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin /home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py /home/ericjiang/.local/share/jupyter/runtime/kernel-4211d8bf-83fe-4583-bb75-8961ca0ceba9.json --RemoteProcessProxy.response-address '{response_address}' --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
[I 201014 16:47:20 web:2162] 200 GET /api/kernels/4211d8bf-83fe-4583-bb75-8961ca0ceba9 (172.24.0.216) 1.44ms
[D 2020-10-14 16:47:20.831 EnterpriseGatewayApp] Clearing buffer for 4211d8bf-83fe-4583-bb75-8961ca0ceba9
20/10/14 16:47:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/10/14 16:47:22 INFO RMProxy: Connecting to ResourceManager at hadoop2/172.24.0.12:8032
20/10/14 16:47:22 INFO Client: Requesting a new application from cluster with 50 NodeManagers
20/10/14 16:47:23 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (24576 MB per container)
20/10/14 16:47:23 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/10/14 16:47:23 INFO Client: Setting up container launch context for our AM
20/10/14 16:47:23 INFO Client: Setting up the launch environment for our AM container
20/10/14 16:47:23 INFO Client: Preparing resources for our AM container
20/10/14 16:47:23 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[D 2020-10-14 16:47:25.841 EnterpriseGatewayApp] Kernel is taking too long to finish, killing
[D 2020-10-14 16:47:25.842 EnterpriseGatewayApp] RemoteKernelManager.signal_kernel(9)
[I 2020-10-14 16:47:26.059 EnterpriseGatewayApp] Kernel shutdown: 4211d8bf-83fe-4583-bb75-8961ca0ceba9
[I 201014 16:47:26 web:2162] 204 DELETE /api/kernels/4211d8bf-83fe-4583-bb75-8961ca0ceba9 (172.24.0.216) 5230.77ms
[D 2020-10-14 16:47:46.926 EnterpriseGatewayApp] Websocket closed 4211d8bf-83fe-4583-bb75-8961ca0ceba9:d8a1fa94-981ba2a5eb54c0728eb91561

kernel.json

{
  "language": "python",
  "display_name": "Spark",
  "metadata": {
    "process_proxy": {
      "class_name": "enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy"
    }
  },
  "env": {
    "SPARK_HOME": "/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7",
    "HADOOP_HOME": "/home/ericjiang/miniconda3/envs/spark/hadoop-2.7.3",
    "PROG_HOME": "/home/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster",
    "SPARK_CONF_DIR": "/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/conf",
    "HADOOP_CONF_DIR": "/home/ericjiang/miniconda3/envs/spark/hadoop-2.7.3/etc/hadoop/",
    "EG_YARN_ENDPOINT": "http://172.24.0.12:8088/ws/v1/cluster",
    "EG_IMPERSONATION_ENABLED": "True", 
    "EG_YARN_LOG_LEVEL" : "DEBUG",
    "EG_KERNEL_LAUNCH_TIMEOUT": "40",
    "KERNEL_LAUNCH_TIMEOUT" : "60",
    "PYSPARK_PYTHON": "/usr/bin/python",
    "PYTHONPATH": "/home/ericjiang/miniconda3/envs/spark/lib/python3.5/site-packages:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python:/home/ericjiang/miniconda3/envs/spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip",
    "SPARK_OPTS": "--master yarn --deploy-mode cluster --name Spark-cluster --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/spark_env.zip#spark --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=spark/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=spark --conf spark.yarn.appMasterEnv.PYTHONPATH=spark/lib/python3.5/site-packages:spark/spark-2.4.3-bin-hadoop2.7/python:spark/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=spark/bin:$PATH",
    "LAUNCH_OPTS": ""
  },
  "argv": [
    "/home/ericjiang/miniconda3/envs/spark/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh",
    "{connection_file}",
    "--RemoteProcessProxy.response-address",
    "{response_address}",
    "--RemoteProcessProxy.port-range",
    "{port_range}",
    "--RemoteProcessProxy.spark-context-initialization-mode",
    "lazy"
  ]
}

conda list

# packages in environment at /home/ericjiang/miniconda3/envs/spark:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
alembic                   1.4.3                      py_0
asn1crypto                1.4.0                      py_0
async_generator           1.10                       py_0
attrs                     20.2.0                   pypi_0    pypi
blas                      1.0                         mkl
bleach                    3.2.1                      py_0
brotlipy                  0.7.0            py35h470a237_0    conda-forge
ca-certificates           2020.7.22                     0
cachetools                4.1.1                    pypi_0    pypi
certifi                   2018.8.24                py35_1
cffi                      1.11.5           py35he75722e_1
chardet                   3.0.4                    pypi_0    pypi
click                     7.1.2                      py_0
cloudpickle               1.6.0                      py_0
configurable-http-proxy   4.0.1                   node6_0
cryptography              2.3.1            py35hc365091_0
cycler                    0.10.0           py35hc4d5149_0
cython                    0.28.5           py35hf484d3e_0
cytoolz                   0.9.0.1          py35h14c3975_1
dask-core                 2.6.0                      py_0
dbus                      1.13.16              hb2f20db_0
decorator                 4.4.2                      py_0
defusedxml                0.6.0                      py_0
distributed               2.7.0                      py_0    anaconda
docker                    4.3.1                    pypi_0    pypi
docopt                    0.6.2                    pypi_0    pypi
entrypoints               0.2.3                    py35_2
expat                     2.2.10               he6710b0_2
findspark                 1.3.0                      py_1    conda-forge
fontconfig                2.13.0               h9420a91_0
freetype                  2.10.3               h5ab3b9f_0
future                    0.18.2                   pypi_0    pypi
glib                      2.63.1               h5a9c865_0
google-auth               1.22.1                   pypi_0    pypi
gst-plugins-base          1.14.0               hbbd80ab_1
gstreamer                 1.14.0               hb453b48_1
heapdict                  1.0.1                      py_0
icu                       58.2                 he6710b0_3
idna                      2.10                       py_0
intel-openmp              2019.4                      243
ipykernel                 4.10.0                   py35_0
ipyparallel               6.3.0                    pypi_0    pypi
ipython                   5.8.0                    py35_0
ipython_genutils          0.2.0            py35hc9e07d0_0
ipywidgets                7.4.1                    py35_0
jinja2                    2.11.2                     py_0
joblib                    0.14.1                     py_0
jpeg                      9b                   h024ee3a_2
json5                     0.9.5                      py_0
jsonschema                3.0.1                    pypi_0    pypi
jupyter                   1.0.0                    py35_7
jupyter-contrib-core      0.3.3                    pypi_0    pypi
jupyter-contrib-nbextensions 0.5.1                    pypi_0    pypi
jupyter-core              4.6.3                    pypi_0    pypi
jupyter-highlight-selected-word 0.2.0                    pypi_0    pypi
jupyter-kernel-gateway    2.2.0                    pypi_0    pypi
jupyter-latex-envs        1.4.6                    pypi_0    pypi
jupyter-nbextensions-configurator 0.4.1                    pypi_0    pypi
jupyter_client            5.3.3                      py_0    conda-forge
jupyter_console           5.2.0            py35h4044a63_1
jupyter_core              4.5.0                      py_0
jupyter_enterprise_gateway 1.1.1                      py_0    conda-forge
jupyter_kernel_gateway    2.1.0                    py35_1
jupyter_server            0.0.2                    py35_0    conda-forge
jupyterhub                0.9.4                    py35_0    conda-forge
jupyterlab                1.2.6              pyhf63ae98_0
jupyterlab_server         1.0.0                      py_0
kiwisolver                1.0.1            py35hf484d3e_0
kmodes                    0.10.2             pyh9f0ad1d_0    conda-forge
kubernetes                11.0.0                   pypi_0    pypi
libcurl                   7.61.1               heec0ca6_0
libedit                   3.1.20191231         h14c3975_1
libffi                    3.2.1             hf484d3e_1007
libgcc                    7.2.0                h69d50b8_2
libgcc-ng                 9.1.0                hdf63c60_0
libgfortran-ng            7.3.0                hdf63c60_0
libpng                    1.6.37               hbc83047_0
libsodium                 1.0.16               h1bed415_0
libssh2                   1.8.0                h9cfc8f7_4
libstdcxx-ng              9.1.0                hdf63c60_0
libuuid                   1.0.3                h1bed415_2
libxcb                    1.14                 h7b6447c_0
libxml2                   2.9.10               he19cac6_1
lightgbm                  2.2.0            py35hfc679d8_0    conda-forge
lxml                      4.5.2                    pypi_0    pypi
mako                      1.1.3                      py_0
markupsafe                1.0              py35h14c3975_1
matplotlib                3.0.0            py35h5429711_0
mistune                   0.8.3            py35h14c3975_1
mkl                       2018.0.3                      1
mkl_fft                   1.0.6            py35h7dd41cf_0
mkl_random                1.0.1            py35h4414c95_1
msgpack-python            0.5.6            py35h6bb024c_1
nb2kg                     0.7.0                    pypi_0    pypi
nbconvert                 5.5.0                      py_0
nbformat                  5.0.7                      py_0
ncurses                   6.2                  he6710b0_1
nodejs                    6.11.2               h3db8ef7_0
notebook                  5.7.0                    pypi_0    pypi
numpy                     1.15.2           py35h1d66e8a_0
numpy-base                1.15.2           py35h81de0dd_0
oauthlib                  3.1.0                    pypi_0    pypi
openssl                   1.0.2u               h7b6447c_0
packaging                 20.4                       py_0
pamela                    1.0.0                      py_0
pandas                    0.25.3                   pypi_0    pypi
pandoc                    2.10.1                        0
pandocfilters             1.4.2                    py35_1
paramiko                  2.1.2                    pypi_0    pypi
pcre                      8.44                 he6710b0_0
pexpect                   4.6.0                    py35_0
pickleshare               0.7.4            py35hd57304d_0
pip                       10.0.1                   py35_0
prometheus_client         0.8.0                      py_0
prompt_toolkit            1.0.15           py35hc09de7a_0
psutil                    5.7.2                    pypi_0    pypi
ptyprocess                0.6.0                    py35_0
py4j                      0.10.7                   py35_0
pyasn1                    0.4.8                      py_0
pyasn1-modules            0.2.8                    pypi_0    pypi
pycparser                 2.20                       py_2
pycrypto                  2.6.1            py35h14c3975_8
pycryptodomex             3.9.8                    pypi_0    pypi
pycurl                    7.43.0.2         py35hb7f436b_0
pygments                  2.7.1                      py_0
pykerberos                1.2.1                    pypi_0    pypi
pyopenssl                 18.0.0                   py35_0
pyparsing                 2.4.7                      py_0
pyqt                      5.9.2            py35h05f1152_2
pyrsistent                0.17.3                   pypi_0    pypi
pysocks                   1.6.8                    py35_0
pyspark                   2.4.5                      py_0
python                    3.5.6                hc3d631a_0
python-dateutil           2.8.1                      py_0
python-editor             1.0.4                      py_0
python-oauth2             1.1.1                      py_0
pytz                      2020.1                     py_0
pyyaml                    5.3.1                    pypi_0    pypi
pyzmq                     17.1.2           py35h14c3975_0
qt                        5.9.6                h8703b6f_2
qtconsole                 4.7.7                      py_0
qtpy                      1.9.0                      py_0
readline                  7.0                  h7b6447c_5
requests                  2.24.0                     py_0
requests-kerberos         0.12.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.6                      pypi_0    pypi
scikit-learn              0.20.0           py35h4989274_1    anaconda
scipy                     1.1.0            py35hfa4b5c9_1
send2trash                1.5.0                    py35_0
setuptools                40.2.0                   py35_0
simplegeneric             0.8.1                    py35_2
sip                       4.19.8           py35hf484d3e_0
six                       1.15.0                     py_0
sortedcontainers          2.2.2                      py_0
sqlalchemy                1.2.11           py35h7b6447c_0
sqlite                    3.33.0               h62c20be_0
tbb                       2020.3               hfd86e86_0
tbb4py                    2018.0.5         py35h6bb024c_0
tblib                     1.7.0                      py_0
terminado                 0.8.1                    py35_1
testpath                  0.4.4                      py_0
tk                        8.6.10               hbc83047_0
toolz                     0.11.1                     py_0
tornado                   5.1.1            py35h7b6447c_0
traitlets                 4.3.3                    pypi_0    pypi
ua-parser                 0.10.0                   pypi_0    pypi
urllib3                   1.25.10                    py_0
wcwidth                   0.2.5                      py_0
webencodings              0.5.1                    py35_1
websocket-client          0.57.0                   pypi_0    pypi
wheel                     0.35.1                     py_0
widgetsnbextension        3.4.1                    py35_0
xz                        5.2.5                h7b6447c_0
yaml                      0.2.5                h7b6447c_0
yarn-api-client           0.2.3                    pypi_0    pypi
zeromq                    4.2.5                hf484d3e_1
zict                      2.0.0                      py_0
zlib                      1.2.11               h7b6447c_3
kevin-bates commented 4 years ago

Hi @SevenMilk - thanks for opening the issue. There are a number of inconsistencies with what you state and what you provide so I suspect we'll need a few iterations here.

  1. The logs show your kernelspec using LocalProcessProxy when the issue shows the correct YarnClusterProcessProxy in use:

    [D 2020-10-14 16:46:46.744 EnterpriseGatewayApp] Instantiating kernel 'Spark' with process proxy: enterprise_gateway.services.processproxies.processproxy.LocalProcessProxy

    As a result, I'll need to see the logs relative to YarnClusterProcessProxy. Since LocalProcessProxy is not a RemoteProcessProxy, the response address template parameter ({response_address}) was left unfilled so the kernel launcher had no way to know where to send its connection information. I suspect the launcher crashed due to this missing parameter.

  2. You don't need any of the following env entries in your kernel.json file:

    "EG_YARN_ENDPOINT": "http://172.24.0.12:8088/ws/v1/cluster",
    "EG_IMPERSONATION_ENABLED": "True", 
    "EG_YARN_LOG_LEVEL" : "DEBUG",
    "EG_KERNEL_LAUNCH_TIMEOUT": "40",
    "KERNEL_LAUNCH_TIMEOUT" : "60",
  3. KERNEL_LAUNCH_TIMEOUT must come from the client-side and is used by EG to determine when to give up waiting for a response from the remote kernel running in the YARN cluster. Because you're configuration wasn't right, we can hold off increasing it for now, but I suspect we'll probably need this increased since you need to load a large file with your kernel.

  4. I don't any evidence of automatic restarts happening. The initial start is timing out due to the configuration issues.

    What is the reason for restarting the spark cluster kernel every time?

  5. You should be using notebook with the --gateway-url option since nb2kg has been in notebook since the 6.0 release. Use something like the following (where 172.24.0.216:8888 is the IP and port that EG is running on).

    jupyter notebook  --gateway-url=http://172.24.0.216:8888 

    None of the class mappings are necessary.

  6. Although your kernel name is 'spark', you're referencing files from the out-of-the-box kernelspec examples (kernels/spark_python_yarn_cluster/bin/run.sh). I would recommend leaving those alone and creating kernelspec hierarchies (bin and scripts sub-folders) for each kernelspec so that they can be individually tuned.

That's enough for now. Please send a new set of logs once these issues have been addressed. If you still have issues, you will need to look at the stdout/stderr logs via the YARN tools. These will contain the output produced by the launcher - which may have some issues with the environment of the node on which it lands since you can't install anaconda there.

SevenMilk commented 4 years ago

Hi @kevin-bates . Thanks for your response !!

My notebook version is 5.7.0. To avoid other interference, I decide to create another env (name=cluster) install jupyter latest version, and it look like good idea.

jupyter-client 6.1.7
jupyter-contrib-core 0.3.3
jupyter-contrib-nbextensions 0.5.1
jupyter-core 4.6.3
jupyter-enterprise-gateway 2.3.0
jupyter-highlight-selected-word 0.2.0
jupyter-kernel-gateway 2.4.3
jupyter-latex-envs 1.4.6
jupyter-nbextensions-configurator 0.4.1
jupyterlab 2.2.8
jupyterlab-server 1.2.0
notebook 6.1.4
tornado 6.0.4
yarn-api-client 1.0.2

update kernel.json

{
  "language": "python",
  "display_name": "Spark - Python (YARN Cluster Mode)",
  "metadata": {
    "process_proxy": {
      "class_name": "enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy"
    }
  },
  "env": {
    "SPARK_HOME": "/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7",
    "SPARK_CONF_DIR": "/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/conf",
    "HADOOP_HOME": "/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3",
    "HADOOP_CONF_DIR": "/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3/etc/hadoop/",    
    "PROG_HOME": "/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster",
    "PYSPARK_PYTHON": "/usr/bin/python3",
    "PYTHONPATH": "/home/ericjiang/miniconda3/envs/cluster/lib/python3.5/site-packages:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip",
    "SPARK_OPTS": "--master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:$PATH",
    "LAUNCH_OPTS": ""
  },
  "argv": [
    "/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh",
    "--RemoteProcessProxy.kernel-id",
    "{kernel_id}",
    "--RemoteProcessProxy.response-address",
    "172.24.0.216:8888",
    "--RemoteProcessProxy.port-range",
    "{port_range}",
    "--RemoteProcessProxy.spark-context-initialization-mode",
    "lazy"
  ]
}

update run command

jupyter notebook  --gateway-url=http://172.24.0.216:8888
jupyter enterprisegateway --ip=172.24.0.216 --port_retries=0 --debug

jupyter_enterprise_gateway_config.py add

c.EnterpriseGatewayApp.yarn_endpoint = 'http://172.24.0.12:8088/cluster'

I solve most of the problems: No.1, No.2, No.4, No.5 About No.6, I understand your thought, but for now I don't have any idea to DIY. If it doesn't cause a running error, I don't want to change it at this moment.

At now, I have a new question, this is my EG log:

[D 2020-10-15 13:37:58.446 EnterpriseGatewayApp] RemoteMappingKernelManager.start_kernel: spark_python_yarn_cluster, kernel_username: ericjiang
[D 2020-10-15 13:37:58.475 EnterpriseGatewayApp] Instantiating kernel 'Spark - Python (YARN Cluster Mode)' with process proxy: enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy
[D 2020-10-15 13:37:58.624 EnterpriseGatewayApp] Response socket launched on '172.24.0.216:60462' using 5.0s timeout
[D 2020-10-15 13:37:58.692 EnterpriseGatewayApp] YarnClusterProcessProxy shutdown wait time adjusted to 15.0 seconds.
[D 2020-10-15 13:37:58.693 EnterpriseGatewayApp] Starting kernel (async): ['/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '--RemoteProcessProxy.kernel-id', 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', '--RemoteProcessProxy.response-address', '172.24.0.216:8888', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-15 13:37:58.693 EnterpriseGatewayApp] Launching kernel: 'Spark - Python (YARN Cluster Mode)' with command: ['/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '--RemoteProcessProxy.kernel-id', 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', '--RemoteProcessProxy.response-address', '172.24.0.216:8888', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2020-10-15 13:37:58.693 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'KG_REQUEST_TIME': '120', 'EG_IMPERSONATION_ENABLED': 'False', 'EG_MAX_PORT_RANGE_RETRIES': '5', 'PYTHONPATH': '/home/ericjiang/miniconda3/envs/cluster/lib/python3.5/site-packages:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip', 'KERNEL_GATEWAY': '1', 'SPARK_OPTS': '--master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin', 'HADOOP_HOME': '/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3', 'KERNEL_LANGUAGE': 'python', 'KERNEL_ID': 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', 'HADOOP_CONF_DIR': '/home/ericjiang/miniconda3/envs/cluster/hadoop-2.7.3/etc/hadoop/', 'LAUNCH_OPTS': '', 'KERNEL_USERNAME': 'ericjiang', 'EG_MIN_PORT_RANGE_SIZE': '1000', 'KERNEL_WORKING_DIR': '/home/ericjiang', 'SPARK_HOME': '/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7', 'SPARK_CONF_DIR': '/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/conf', 'PYSPARK_PYTHON': '/usr/local/bin/ipython3', 'PROG_HOME': '/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster', 'PATH': '/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin', 'KERNEL_LAUNCH_TIMEOUT': '40'}
[D 2020-10-15 13:37:58.697 EnterpriseGatewayApp] Yarn cluster kernel launched using YARN RM address: http://172.24.0.12:8088, pid: 26955, Kernel ID: aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b, cmd: '['/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '--RemoteProcessProxy.kernel-id', 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b', '--RemoteProcessProxy.response-address', '172.24.0.216:8888', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']'

Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user ericjiang

+ eval exec /home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/bin/spark-submit '--master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin' '' /home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' --RemoteProcessProxy.kernel-id aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b --RemoteProcessProxy.response-address 172.24.0.216:8888 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
++ exec /home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/bin/spark-submit --master yarn --deploy-mode cluster --name clusterMode --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.dist.archives=/home/ericjiang/miniconda3/envs/cluster.zip#cluster --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.executorEnv.PYSPARK_PYTHON=cluster/bin/python3.5 --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=cluster --conf spark.yarn.appMasterEnv.PYTHONPATH=cluster/lib/python3.5/site-packages:cluster/spark-2.4.3-bin-hadoop2.7/python:cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip --conf spark.yarn.appMasterEnv.PATH=cluster/bin:/home/ericjiang/bin:/home/ericjiang/.local/bin:/home/ericjiang/miniconda3/envs/cluster/bin:/home/ericjiang/miniconda3/condabin:/opt/hadoop-2.7.3/bin:/bin:/home/ericjiang/Kafka/kafka_2.12-2.5.0/bin:/home/ericjiang/Scala/scala-2.12.10/bin:/home/ericjiang/Hive/apache-hive-2.3.7-bin/bin:/home/ericjiang/Spark/spark-2.4.3-bin-hadoop2.7/bin:/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin:/usr/share/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/opt/spark-2.4.3-bin-hadoop2.7/bin /home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py --RemoteProcessProxy.kernel-id aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b --RemoteProcessProxy.response-address 172.24.0.216:8888 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
[D 2020-10-15 13:37:58.702 EnterpriseGatewayApp] Serving kernel resource from: /home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster
[I 201015 13:37:58 web:2250] 200 GET /kernelspecs/spark_python_yarn_cluster/logo-64x64.png (172.24.0.216) 5.05ms
[D 2020-10-15 13:37:59.220 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:37:59.736 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:00.255 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[D 2020-10-15 13:38:00.771 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:01.285 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:01 INFO RMProxy: Connecting to ResourceManager at hadoop2/172.24.0.12:8032
20/10/15 13:38:01 INFO Client: Requesting a new application from cluster with 50 NodeManagers
20/10/15 13:38:01 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (24576 MB per container)
20/10/15 13:38:01 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
20/10/15 13:38:01 INFO Client: Setting up container launch context for our AM
20/10/15 13:38:01 INFO Client: Setting up the launch environment for our AM container
20/10/15 13:38:01 INFO Client: Preparing resources for our AM container
[D 2020-10-15 13:38:01.801 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[D 2020-10-15 13:38:02.316 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:02.835 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:03.353 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:03.869 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:04.385 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:04.900 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:05 INFO Client: Uploading resource file:/home/webuser/tmp/spark-b95873a4-4151-4277-947b-f3c6f9b8cbb9/__spark_libs__8498809013287075042.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/__spark_libs__8498809013287075042.zip
[D 2020-10-15 13:38:05.415 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:05.932 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:06.451 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:06.967 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:07.484 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:08.000 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:08 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster.zip#cluster -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/cluster.zip
[D 2020-10-15 13:38:08.521 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:09.041 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:09.560 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:10.078 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:10.596 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:11.117 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:11.634 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:12.154 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:12.671 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:13.191 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:13.709 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:14.249 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:14.763 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:15.283 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:15.800 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/launch_ipykernel.py
[D 2020-10-15 13:38:16.316 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/pyspark.zip
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/ericjiang/miniconda3/envs/cluster/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/py4j-0.10.7-src.zip
20/10/15 13:38:16 INFO Client: Uploading resource file:/home/webuser/tmp/spark-b95873a4-4151-4277-947b-f3c6f9b8cbb9/__spark_conf__4950530639879277944.zip -> hdfs://mycluster/user/ericjiang/.sparkStaging/application_1600149154303_103906/__spark_conf__.zip
[D 2020-10-15 13:38:16.833 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:16 INFO SecurityManager: Changing view acls to: ericjiang
20/10/15 13:38:16 INFO SecurityManager: Changing modify acls to: ericjiang
20/10/15 13:38:16 INFO SecurityManager: Changing view acls groups to:
20/10/15 13:38:16 INFO SecurityManager: Changing modify acls groups to:
20/10/15 13:38:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ericjiang); groups with view permissions: Set(); users  with modify permissions: Set(ericjiang); groups with modify permissions: Set()
[D 2020-10-15 13:38:17.350 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:17.865 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:18.386 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
20/10/15 13:38:18 INFO Client: Submitting application application_1600149154303_103906 to ResourceManager
20/10/15 13:38:18 INFO YarnClientImpl: Submitted application application_1600149154303_103906
20/10/15 13:38:18 INFO Client: Application report for application_1600149154303_103906 (state: ACCEPTED)
20/10/15 13:38:18 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.default
         start time: 1602740298698
         final status: UNDEFINED
         tracking URL: http://hadoop2:9046/proxy/application_1600149154303_103906/
         user: ericjiang
20/10/15 13:38:18 INFO ShutdownHookManager: Shutdown hook called
20/10/15 13:38:18 INFO ShutdownHookManager: Deleting directory /home/webuser/tmp/spark-0060dcff-9b88-4857-8518-9e56fc6c5a10
20/10/15 13:38:18 INFO ShutdownHookManager: Deleting directory /home/webuser/tmp/spark-b95873a4-4151-4277-947b-f3c6f9b8cbb9
[D 2020-10-15 13:38:18.903 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:19.420 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:19.938 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:20.453 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:20.971 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:21.489 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:22.005 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:22.520 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:23.039 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:23.556 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:24.073 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:24.592 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:25.112 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:25.631 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:26.147 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:26.663 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:27.180 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:27.695 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:28.213 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:28.735 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:29.253 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:29.771 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:30.290 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:30.808 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:31.324 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:31.840 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:32.355 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:32.871 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:33.389 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:33.903 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:34.420 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:34.936 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:35.458 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:35.975 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:36.492 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:37.009 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:37.524 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:38.039 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:38.555 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.074 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.088 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.088 EnterpriseGatewayApp] BaseProcessProxy.terminate(): None
[D 2020-10-15 13:38:39.098 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' - retrying...
[D 2020-10-15 13:38:39.098 EnterpriseGatewayApp] YarnClusterProcessProxy.kill, application ID: None, kernel ID: aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b, state: None, result: None
[D 2020-10-15 13:38:39.099 EnterpriseGatewayApp] response socket still open, close it
[E 2020-10-15 13:38:39.099 EnterpriseGatewayApp] KernelID: 'aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b' launch timeout due to: Application ID is None. Failed to submit a new application to YARN within 40.0 seconds.  Check Enterprise Gateway log for more information.
[E 201015 13:38:39 web:2250] 500 POST /api/kernels (172.24.0.216) 40654.66ms

you can see this log

YarnClusterProcessProxy.kill, application ID: None, kernel ID: aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b, state: None, result: None

It's seen that it's not successful Is there any place I haven’t set it up rightly? or It's related with anaconda?

Otherwise, I found something interesting. When the kernel is timeout and shutdown, the YARN still have my program on RUNNING Applications list. Is this correct?

image

Attach RUNNING Applications stderr, stdout, and pip list

RUNNING Applications stderr

20/10/15 13:38:48 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@hadoop21:10941)
20/10/15 13:38:48 INFO YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
20/10/15 13:38:48 INFO YarnAllocator: Submitted 2 unlocalized container requests.
20/10/15 13:38:48 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
20/10/15 13:38:48 INFO AMRMClientImpl: Received new token for : hadoop34:14797
20/10/15 13:38:48 INFO YarnAllocator: Launching container container_1600149154303_103906_01_000002 on host hadoop34 for executor with ID 1
20/10/15 13:38:48 INFO YarnAllocator: Launching container container_1600149154303_103906_01_000003 on host hadoop34 for executor with ID 2
20/10/15 13:38:48 INFO YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: Opening proxy : hadoop34:14797
20/10/15 13:38:48 INFO ContainerManagementProtocolProxy: Opening proxy : hadoop34:14797
20/10/15 13:39:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.0.34:56716) with ID 1
20/10/15 13:39:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.0.34:56714) with ID 2
20/10/15 13:39:16 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
20/10/15 13:39:16 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
20/10/15 13:39:16 INFO BlockManagerMasterEndpoint: Registering block manager hadoop34:24362 with 434.4 MB RAM, BlockManagerId(1, hadoop34, 24362, None)
20/10/15 13:39:16 INFO BlockManagerMasterEndpoint: Registering block manager hadoop34:34861 with 434.4 MB RAM, BlockManagerId(2, hadoop34, 34861, None)

RUNNING Applications stdout

[D 2020-10-15 13:38:42,738.738 launch_ipykernel] Using connection file '/tmp/kernel-aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b_uaqjgxxd.json'.
[I 2020-10-15 13:38:42,739.739 launch_ipykernel] Signal socket bound to host: 0.0.0.0, port: 58354
[D 2020-10-15 13:38:42,739.739 launch_ipykernel] JSON Payload 'b'{"transport": "tcp", "key": "82174c3f-bb35-4322-8aa1-26fdd04cae7a", "ip": "0.0.0.0", "signature_scheme": "hmac-sha256", "comm_port": 58354, "kernel_name": "", "pgid": "11292", "iopub_port": 31872, "hb_port": 32353, "control_port": 64213, "shell_port": 49445, "pid": "11487", "stdin_port": 37770}'
[D 2020-10-15 13:38:42,743.743 launch_ipykernel] Encrypted Payload 'b'1ikkW7ADbU9d9w3qdNpPXIDNiTkmjHy7bE7O69z0sBcQWrpRiY90LCFHoUAqbS3k6ilVe3cfjL1hgK7r0Z2WRD4jlWVHh1nlSdYV6rfsD+R0dB1Ca9OabZN+bpcc8CigiW7d4flo2nCXpbSmjUzkyf2NJ6FUqPo5d1SpaxipVNJ58X22+qwffJZZAdNQrqoQfOER5+984fiKhckuviemk9LVxw+KrSF2k8nYtHlGBycCflMm5NZHqybbiaSUcsJrljvyxgq/cUcsKpJrTuIe1EtYem6rDr859kmj3Mnx9wEOGRlaZZpT0/woQSH/LUuB5jnQjQMWb5Vjk8JeOdHYF/eamDNKcluX1MK4d3rH/WnVKrnafxf8zzWbreoFPgQ5pgLTELdaRECOK3KsB6F9AQ=='
/mnt/hdfs3/nm/usercache/ericjiang/appcache/application_1600149154303_103906/container_1600149154303_103906_01_000001/cluster/lib/python3.5/site-packages/IPython/paths.py:68: UserWarning: IPython parent '/home' is not a writable location, using a temp directory.
" using a temp directory.".format(parent))
NOTE: When using the ipython kernel entry point, Ctrl-C will not work.

To exit, you will have to explicitly quit this process, by either sending
"quit" from a client, or using Ctrl-\ in UNIX-like environments.

To read more about this, see ipython/ipython: Issue #2049

To connect another client to this kernel, use:
--existing /tmp/kernel-aa2ecd0d-9b6a-4f15-9d5f-6293f2818a1b_uaqjgxxd.json

pip list

Package                           Version
--------------------------------- ------------
argon2-cffi                       20.1.0
attrs                             20.2.0
backcall                          0.2.0
bcrypt                            3.1.7
bleach                            3.2.1
cachetools                        4.1.1
certifi                           2018.8.24
cffi                              1.14.3
chardet                           3.0.4
click                             7.1.2
cloudpickle                       1.6.0
confluent-kafka                   1.5.0
cryptography                      3.1.1
cycler                            0.10.0
dask                              2.6.0
decorator                         4.4.2
defusedxml                        0.6.0
distributed                       2.6.0
docker                            4.3.1
docopt                            0.6.2
entrypoints                       0.3
featuretools                      0.13.4
findspark                         1.4.2
future                            0.18.2
google-auth                       1.22.1
hdfs                              2.5.8
HeapDict                          1.0.1
idna                              2.10
importlib-metadata                2.0.0
ipykernel                         5.3.4
ipython                           7.9.0
ipython-genutils                  0.2.0
jedi                              0.17.2
Jinja2                            2.11.2
joblib                            0.14.1
json5                             0.9.5
jsonschema                        3.2.0
jupyter-client                    6.1.7
jupyter-contrib-core              0.3.3
jupyter-contrib-nbextensions      0.5.1
jupyter-core                      4.6.3
jupyter-enterprise-gateway        2.3.0
jupyter-highlight-selected-word   0.2.0
jupyter-kernel-gateway            2.4.3
jupyter-latex-envs                1.4.6
jupyter-nbextensions-configurator 0.4.1
jupyterlab                        2.2.8
jupyterlab-server                 1.2.0
kiwisolver                        1.1.0
kmodes                            0.10.2
kubernetes                        12.0.0
lesscpy                           0.14.0
lightgbm                          3.0.0
lxml                              4.5.2
MarkupSafe                        1.1.1
matplotlib                        3.0.3
mistune                           0.8.4
msgpack                           0.6.2
nbconvert                         5.6.1
nbformat                          5.0.8
notebook                          6.1.4
numpy                             1.18.5
oauthlib                          3.1.0
packaging                         20.4
pandas                            0.25.3
pandocfilters                     1.4.2
paramiko                          2.7.2
parso                             0.7.1
pexpect                           4.8.0
pickleshare                       0.7.5
pip                               10.0.1
ply                               3.11
prometheus-client                 0.8.0
prompt-toolkit                    2.0.10
psutil                            5.7.2
ptyprocess                        0.6.0
py4j                              0.10.9
pyasn1                            0.4.8
pyasn1-modules                    0.2.8
pycparser                         2.20
pycryptodomex                     3.9.8
PyEmail                           0.0.1
Pygments                          2.7.1
PyNaCl                            1.4.0
pyparsing                         2.4.7
pypinyin                          0.39.0
pyrsistent                        0.17.3
pyspark                           3.0.0
python-dateutil                   2.8.1
pytz                              2020.1
PyYAML                            5.3.1
pyzmq                             19.0.2
requests                          2.24.0
requests-oauthlib                 1.3.0
rsa                               4.6
scikit-learn                      0.22.2.post1
scipy                             1.4.1
seaborn                           0.9.1
Send2Trash                        1.5.0
setuptools                        50.3.1
six                               1.15.0
sortedcontainers                  2.2.2
tblib                             1.7.0
terminado                         0.8.3
testpath                          0.4.4
toolz                             0.11.1
tornado                           6.0.4
tqdm                              4.49.0
traitlets                         4.3.3
ua-parser                         0.10.0
urllib3                           1.25.9
user-agents                       2.2.0
wcwidth                           0.2.5
webencodings                      0.5.1
websocket-client                  0.57.0
wheel                             0.35.1
yapf                              0.30.0
yarn-api-client                   1.0.2
zict                              2.0.0
zipp                              1.2.0
kevin-bates commented 4 years ago

To avoid other interference, I decide to create another env (name=cluster) install jupyter latest version, and it look like good idea.

Yeah, you want NB 6 and EG 2.2 so you can take advantage of the async kernel management - which is a big win for EG.

Regarding No. 6, I should have probably have stated this a different way. I suspect you created your "spark" kernelspec directory by copying the directory of spark_python_yarn_cluster and there are path references that should technically be updated (i.e., change spark_python_yarn_cluster to spark) otherwise changes to spark/bin/run.sh won't get picked up.

Regarding your latest issue, your YARN resource manager has yet to assign the kernel a host and KERNEL_LAUNCH_TIMEOUT is being exceeded after 40 seconds. You should try to determine if your YARN cluster is saturated. Also, I would recommend extending your KERNEL_LAUNCH_TIMEOUT to 120 on your client and restart your client so that it carries over. By the way, this is what I was getting at with item number 3.

Everything else looks good. EG will not attempt to communicate with the kernel until its been assigned a host and it knows the host - which it gets via the application state polling that eventually times out.

SevenMilk commented 4 years ago

Hi @kevin-bates . Thanks for your help

Great! I am able to use YARN-cluster-mode. I finally change the parameter of "--name" in "SPARK_OPTS" and "--RemoteProcessProxy.response-address" so that the kernel can run normally.

At last, I write the installation process as a github document . I hope this documentation can be useful for more people when installing jupyter-enterprise-gateway.

Only the mandarin available now. The English version will be added later :)

kevin-bates commented 4 years ago

Excellent - glad to hear you're moving forward. I will go ahead and close this issue.

Regarding the documentation, it would be great to see if you could incorporate any important changes into our existing docs.

Also, I see you're referencing Python 3.5 in your document. Would it be possible to bump that to Python 3.6 (at a minimum)? 3.5 was end-of-life last month and we will likely be dropping 3.5 support in our 3.0 release. Just a heads up. Thanks.