Closed ziedbouf closed 5 years ago
I suspect your issues are related to not flowing the variables to the yarn processes. So while you have the variables defined in env:
in the kernel.json, that essentially makes those variables available to the kernel launch - namely run.sh
.
To flow these variables to the yarn master (in cluster mode) or workers (in client mode), you will do this via other configuration settings:
In cluster mode, this is accomplished via --conf spark.yarn.appMasterEnv.<variable1=<value1> --conf spark.yarn.appMasterEnv.<variable2=<value2> ...
In client mode, this is accomplished via "SPARK_YARN_USER_ENV": "<variable1>=<value1>:<variable2>=<value2>:...
By the way, if these represent per-user values, you can flow these from the client by adding their names to the KG_ENV_WHITELIST
(comma-separated). NB2KG and Gateway will then ensure they are in the env for the kernel's launch (in this case in run.sh
). Or create names prefixed with KERNEL_
, then massage those values into the expected names via run.sh
, etc.
cc @lresende for more specific Spark/YARN advice
@kevin-bates i using --conf spark.yarn.appMasterEnv.<variable1=<value1> --conf spark.yarn.appMasterEnv.<variable2=<value2>
but this lead to Timeout for spinning up the spark context.
SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=this_secret_key --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:$PATH"
But this is lead to timeout yarn logs -applicationId application_1536672003321_0007
:
18/09/12 11:56:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, elyra); groups with view permissions: Set(); users with modify permissions: Set(yarn, elyra); groups with modify permissions: Set()
18/09/12 11:56:14 INFO ApplicationMaster: Preparing Local resources
18/09/12 11:56:15 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1536672003321_0007_000001
18/09/12 11:56:15 INFO ApplicationMaster: Starting the user application in a separate Thread
18/09/12 11:56:15 INFO ApplicationMaster: Waiting for spark context initialization...
18/09/12 11:57:55 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/09/12 11:57:55 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
18/09/12 11:57:55 INFO ShutdownHookManager: Shutdown hook called
Same happens in the client mode and basically but i am still debugging to understand more the dynamics between different components.
I think i missed up with my setup :D and i might need to redo things from scratch.
I think you're getting closer. You might try setting the following option to provide more time to create a spark context, although we have not had to typically do this with python kernels: --conf spark.yarn.am.waitTime=1d
.
It seems to me like you should focus on a single mode for now. I would focus on 'cluster mode' since it doesn't require the distribution of the kernelspecs to the worker nodes and is probably where you want to be in the end anyway.
These stack traces look odd to me. Its almost like you're not using the python launcher (launch_ipykernel.py
) that should reside in the scripts directory of your kernelspec.
After focusing only on cluster mode, adding the waitTime property and reattempting. Please provide the full EG log, run.sh and yarn cluster mode kernel.json files. In addition, since you're definitely creating the YARN application, take a look at the application log files (stdout and stderr are typically the most helpful). You should see output from the launch_ipykernel.py
script relative to creation of the 5 ports, etc.
I feel lost in here, is it possible to shade some light on the thought process. The following are:
[elyra@spark-master ~]$ cat /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/kernel.json
{
"language": "python",
"display_name": "Spark - Python (YARN Cluster Mode)",
"process_proxy": {
"class_name": "enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy"
},
"env": {
"SPARK_HOME": "/usr/hdp/current/spark2-client",
"PYSPARK_PYTHON": "/opt/anaconda3/bin/python",
"PYTHONPATH": "/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip",
"SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.am.waitTime=1d --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=thisjustblabalabala --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:$PATH",
"LAUNCH_OPTS": ""
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh",
"{connection_file}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
[elyra@spark-master ~]$ cat /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh
#!/usr/bin/env bash
export PYSPARK_GATEWAY_SECRET="w<X?u6I&Ekt>49n}K5kBJ^QM@Zz)Mf"
if [ "${EG_IMPERSONATION_ENABLED}" = "True" ]; then
IMPERSONATION_OPTS="--proxy-user ${KERNEL_USERNAME:-UNSPECIFIED}"
USER_CLAUSE="as user ${KERNEL_USERNAME:-UNSPECIFIED}"
else
IMPERSONATION_OPTS=""
USER_CLAUSE="on behalf of user ${KERNEL_USERNAME:-UNSPECIFIED}"
fi
echo
echo "Starting IPython kernel for Spark in Yarn Cluster mode ${USER_CLAUSE}"
echo
if [ -z "${SPARK_HOME}" ]; then
echo "SPARK_HOME must be set to the location of a Spark distribution!"
exit 1
fi
PROG_HOME="$(cd "`dirname "$0"`"/..; pwd)"
set -x
eval exec \
"${SPARK_HOME}/bin/spark-submit" \
"${SPARK_OPTS}" \
"${IMPERSONATION_OPTS}" \
"${PROG_HOME}/scripts/launch_ipykernel.py" \
"${LAUNCH_OPTS}" \
"$@"
set +x
[elyra@spark-master ~]$ nano /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh
[elyra@spark-master ~]$ exit
logout
[root@spark-master zied]# nano /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh
[root@spark-master zied]# cat /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh
#!/usr/bin/env bash
if [ "${EG_IMPERSONATION_ENABLED}" = "True" ]; then
IMPERSONATION_OPTS="--proxy-user ${KERNEL_USERNAME:-UNSPECIFIED}"
USER_CLAUSE="as user ${KERNEL_USERNAME:-UNSPECIFIED}"
else
IMPERSONATION_OPTS=""
USER_CLAUSE="on behalf of user ${KERNEL_USERNAME:-UNSPECIFIED}"
fi
echo
echo "Starting IPython kernel for Spark in Yarn Cluster mode ${USER_CLAUSE}"
echo
if [ -z "${SPARK_HOME}" ]; then
echo "SPARK_HOME must be set to the location of a Spark distribution!"
exit 1
fi
PROG_HOME="$(cd "`dirname "$0"`"/..; pwd)"
set -x
eval exec \
"${SPARK_HOME}/bin/spark-submit" \
"${SPARK_OPTS}" \
"${IMPERSONATION_OPTS}" \
"${PROG_HOME}/scripts/launch_ipykernel.py" \
"${LAUNCH_OPTS}" \
"$@"
set +x
[elyra@spark-master ~]$ tail -f /opt/elyra/log/enterprise_gateway_2018-09-12.log
[D 2018-09-12 14:16:23.305 EnterpriseGatewayApp] Looking for jupyter_config in /opt/anaconda3/etc/jupyter
[D 2018-09-12 14:16:23.306 EnterpriseGatewayApp] Looking for jupyter_config in /home/elyra/.jupyter
[D 2018-09-12 14:16:23.306 EnterpriseGatewayApp] Looking for jupyter_config in /home/elyra
[D 2018-09-12 14:16:23.308 EnterpriseGatewayApp] Looking for jupyter_enterprise_gateway_config in /etc/jupyter
[D 2018-09-12 14:16:23.308 EnterpriseGatewayApp] Looking for jupyter_enterprise_gateway_config in /usr/local/etc/jupyter
[D 2018-09-12 14:16:23.308 EnterpriseGatewayApp] Looking for jupyter_enterprise_gateway_config in /opt/anaconda3/etc/jupyter
[D 2018-09-12 14:16:23.308 EnterpriseGatewayApp] Looking for jupyter_enterprise_gateway_config in /home/elyra/.jupyter
[D 2018-09-12 14:16:23.308 EnterpriseGatewayApp] Looking for jupyter_enterprise_gateway_config in /home/elyra
[D 180912 14:16:23 selector_events:65] Using selector: EpollSelector
[I 2018-09-12 14:16:23.325 EnterpriseGatewayApp] Jupyter Enterprise Gateway at http://0.0.0.0:8888
[D 2018-09-12 14:16:32.371 EnterpriseGatewayApp] Found kernel python3 in /opt/anaconda3/share/jupyter/kernels
[D 2018-09-12 14:16:32.371 EnterpriseGatewayApp] Found kernel ir in /opt/anaconda3/share/jupyter/kernels
[D 2018-09-12 14:16:32.371 EnterpriseGatewayApp] Found kernel spark_scala in /usr/local/share/jupyter/kernels
[D 2018-09-12 14:16:32.371 EnterpriseGatewayApp] Found kernel spark_python_yarn_cluster in /usr/local/share/jupyter/kernels
[D 2018-09-12 14:16:32.371 EnterpriseGatewayApp] Found kernel spark_python_yarn_client in /usr/local/share/jupyter/kernels
[I 180912 14:16:32 web:2106] 200 GET /api/kernelspecs (127.0.0.1) 218.21ms
[D 2018-09-12 14:16:44.069 EnterpriseGatewayApp] Found kernel python3 in /opt/anaconda3/share/jupyter/kernels
[D 2018-09-12 14:16:44.070 EnterpriseGatewayApp] Found kernel ir in /opt/anaconda3/share/jupyter/kernels
[D 2018-09-12 14:16:44.070 EnterpriseGatewayApp] Found kernel spark_scala in /usr/local/share/jupyter/kernels
[D 2018-09-12 14:16:44.070 EnterpriseGatewayApp] Found kernel spark_python_yarn_cluster in /usr/local/share/jupyter/kernels
[D 2018-09-12 14:16:44.070 EnterpriseGatewayApp] Found kernel spark_python_yarn_client in /usr/local/share/jupyter/kernels
[I 180912 14:16:44 web:2106] 200 GET /api/kernelspecs (127.0.0.1) 5.52ms
[D 2018-09-12 14:16:44.263 EnterpriseGatewayApp] RemoteMappingKernelManager.start_kernel: spark_python_yarn_cluster
[D 2018-09-12 14:16:44.273 EnterpriseGatewayApp] Instantiating kernel 'Spark - Python (YARN Cluster Mode)' with process proxy: enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy
[D 2018-09-12 14:16:44.277 EnterpriseGatewayApp] Response socket launched on 10.132.0.4, port: 56820 using 5.0s timeout
[D 2018-09-12 14:16:44.278 EnterpriseGatewayApp] Starting kernel: ['/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/elyra/.local/share/jupyter/runtime/kernel-915100ad-6520-416c-b01d-8d7f8dd73344.json', '--RemoteProcessProxy.response-address', '10.132.0.4:56820', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2018-09-12 14:16:44.279 EnterpriseGatewayApp] Launching kernel: Spark - Python (YARN Cluster Mode) with command: ['/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/elyra/.local/share/jupyter/runtime/kernel-915100ad-6520-416c-b01d-8d7f8dd73344.json', '--RemoteProcessProxy.response-address', '10.132.0.4:56820', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2018-09-12 14:16:44.279 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'PATH': '/opt/anaconda3/bin:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.8.0_181-amd64/bin:/usr/java/jdk1.8.0_181-amd64/jre/bin:/opt/anaconda3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/elyra/.local/bin:/home/elyra/bin', 'KERNEL_USERNAME': 'elyra', 'SPARK_HOME': '/usr/hdp/current/spark2-client', 'PYSPARK_PYTHON': '/opt/anaconda3/bin/python', 'PYTHONPATH': '/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip', 'SPARK_OPTS': '--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.am.waitTime=1d --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=thisjustblabalabala --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:$PATH', 'LAUNCH_OPTS': '', 'KERNEL_GATEWAY': '1', 'EG_MIN_PORT_RANGE_SIZE': '1000', 'EG_MAX_PORT_RANGE_RETRIES': '5', 'KERNEL_ID': '915100ad-6520-416c-b01d-8d7f8dd73344', 'EG_IMPERSONATION_ENABLED': 'False'}
[D 2018-09-12 14:16:44.287 EnterpriseGatewayApp] Yarn cluster kernel launched using YARN endpoint: http://spark-master:8088/ws/v1/cluster, pid: 19827, Kernel ID: 915100ad-6520-416c-b01d-8d7f8dd73344, cmd: '['/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh', '/home/elyra/.local/share/jupyter/runtime/kernel-915100ad-6520-416c-b01d-8d7f8dd73344.json', '--RemoteProcessProxy.response-address', '10.132.0.4:56820', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']'
Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user elyra
+ eval exec /usr/hdp/current/spark2-client/bin/spark-submit '--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.am.waitTime=1d --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=thisjustblabalabala --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:$PATH' '' /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' /home/elyra/.local/share/jupyter/runtime/kernel-915100ad-6520-416c-b01d-8d7f8dd73344.json --RemoteProcessProxy.response-address 10.132.0.4:56820 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
++ exec /usr/hdp/current/spark2-client/bin/spark-submit --master yarn --deploy-mode cluster --name 915100ad-6520-416c-b01d-8d7f8dd73344 --conf spark.yarn.am.waitTime=1d --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=thisjustblabalabala --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:/opt/anaconda3/bin:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.8.0_181-amd64/bin:/usr/java/jdk1.8.0_181-amd64/jre/bin:/opt/anaconda3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/elyra/.local/bin:/home/elyra/bin /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py /home/elyra/.local/share/jupyter/runtime/kernel-915100ad-6520-416c-b01d-8d7f8dd73344.json --RemoteProcessProxy.response-address 10.132.0.4:56820 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy
ls: cannot access /usr/hdp/2.6.5.0/hadoop/lib: No such file or directory
[D 2018-09-12 14:16:44.792 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:45.296 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:45.800 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
18/09/12 14:16:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[D 2018-09-12 14:16:46.303 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:46.806 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
18/09/12 14:16:47 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
18/09/12 14:16:47 INFO RMProxy: Connecting to ResourceManager at spark-master.c.mozn-location.internal/10.132.0.4:8050
[D 2018-09-12 14:16:47.310 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
18/09/12 14:16:47 INFO Client: Requesting a new application from cluster with 3 NodeManagers
18/09/12 14:16:47 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (24576 MB per container)
18/09/12 14:16:47 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
18/09/12 14:16:47 INFO Client: Setting up container launch context for our AM
18/09/12 14:16:47 INFO Client: Setting up the launch environment for our AM container
18/09/12 14:16:47 INFO Client: Preparing resources for our AM container
[D 2018-09-12 14:16:47.813 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:48.317 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:48.820 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
18/09/12 14:16:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[D 2018-09-12 14:16:49.324 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:49.827 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:50.331 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:50.834 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
18/09/12 14:16:51 INFO Client: Uploading resource file:/tmp/spark-c6c0b425-2acd-4b07-9df2-09731192d3d7/__spark_libs__8613768646819207724.zip -> hdfs://spark-master.c.mozn-location.internal:8020/user/elyra/.sparkStaging/application_1536672003321_0065/__spark_libs__8613768646819207724.zip
[D 2018-09-12 14:16:51.337 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:51.841 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
[D 2018-09-12 14:16:52.345 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344' - retrying...
18/09/12 14:16:52 INFO Client: Uploading resource file:/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py -> hdfs://spark-master.c.mozn-location.internal:8020/user/elyra/.sparkStaging/application_1536672003321_0065/launch_ipykernel.py
18/09/12 14:16:52 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip -> hdfs://spark-master.c.mozn-location.internal:8020/user/elyra/.sparkStaging/application_1536672003321_0065/pyspark.zip
18/09/12 14:16:52 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip -> hdfs://spark-master.c.mozn-location.internal:8020/user/elyra/.sparkStaging/application_1536672003321_0065/py4j-0.10.6-src.zip
18/09/12 14:16:52 INFO Client: Uploading resource file:/tmp/spark-c6c0b425-2acd-4b07-9df2-09731192d3d7/__spark_conf__178429387221043893.zip -> hdfs://spark-master.c.mozn-location.internal:8020/user/elyra/.sparkStaging/application_1536672003321_0065/__spark_conf__.zip
18/09/12 14:16:52 INFO SecurityManager: Changing view acls to: elyra
18/09/12 14:16:52 INFO SecurityManager: Changing modify acls to: elyra
18/09/12 14:16:52 INFO SecurityManager: Changing view acls groups to:
18/09/12 14:16:52 INFO SecurityManager: Changing modify acls groups to:
18/09/12 14:16:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(elyra); groups with view permissions: Set(); users with modify permissions: Set(elyra); groups with modify permissions: Set()
18/09/12 14:16:52 INFO Client: Submitting application application_1536672003321_0065 to ResourceManager
18/09/12 14:16:52 INFO YarnClientImpl: Submitted application application_1536672003321_0065
18/09/12 14:16:52 INFO Client: Application report for application_1536672003321_0065 (state: ACCEPTED)
18/09/12 14:16:52 INFO Client:
client token: N/A
diagnostics: [Wed Sep 12 14:16:52 +0000 2018] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:73728, vCores:18> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 40.27778 % ; Queue's Absolute max capacity = 100.0 % ;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1536761812710
final status: UNDEFINED
tracking URL: http://spark-master.c.mozn-location.internal:8088/proxy/application_1536672003321_0065/
user: elyra
18/09/12 14:16:52 INFO ShutdownHookManager: Shutdown hook called
18/09/12 14:16:52 INFO ShutdownHookManager: Deleting directory /tmp/spark-d62022bd-bf07-421f-bcc1-2f937ba5bfd0
18/09/12 14:16:52 INFO ShutdownHookManager: Deleting directory /tmp/spark-c6c0b425-2acd-4b07-9df2-09731192d3d7
[I 2018-09-12 14:16:52.849 EnterpriseGatewayApp] ApplicationID: 'application_1536672003321_0065' assigned for KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344', state: ACCEPTED, 8.0 seconds after starting.
[D 2018-09-12 14:16:52.852 EnterpriseGatewayApp] 17: State: 'ACCEPTED', Host: '', KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344', ApplicationID: 'application_1536672003321_0065'
[D 2018-09-12 14:16:53.357 EnterpriseGatewayApp] 18: State: 'ACCEPTED', Host: 'spark-worker-1.c.mozn-location.internal', KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344', ApplicationID: 'application_1536672003321_0065'
[D 2018-09-12 14:16:58.362 EnterpriseGatewayApp] Waiting for KernelID '915100ad-6520-416c-b01d-8d7f8dd73344' to send connection info from host 'spark-worker-1.c.mozn-location.internal' - retrying...
[D 2018-09-12 14:16:58.867 EnterpriseGatewayApp] 19: State: 'ACCEPTED', Host: 'spark-worker-1.c.mozn-location.internal', KernelID: '915100ad-6520-416c-b01d-8d7f8dd73344', ApplicationID: 'application_1536672003321_0065'
[D 2018-09-12 14:16:58.867 EnterpriseGatewayApp] Received Payload 'xXseh4YIaBIjHK40EaJYxpeu0HoUetzbK7D9SGaZMbM7jCqE2Yk5ctbJsl9wlJQq/+JTW86mPhXQc3IDOcaGupugD141PZA5SNX4q/zOM/fjSQFzSAlc02fywPr3wW6TLp//ZSCJfJD5cWFX4I0y2xWJFwU7foalmKXREk52F+bgFWJ3cL5NxKML8GzaiEWRICPffVimPVG0b1UhgXyi+9ya64lFlJ9U+kpuOYqgEgkhmxstTlu/5f2u3w47CHomw1N4TqviMxM0RAiXZRfcyyIXpkF4JZzAS3ZucXaEuHDf++/XuZcdHl2Hz0ACoqF5T2/8pXKhk58l1tK81Pgjl0pcpWmXTtsaJrhHoz9FQpZ7qbLxKWd9Yt/cvrGTfpjNuxC1+olNqqMwsUMAKbjBrA=='
[D 2018-09-12 14:16:58.867 EnterpriseGatewayApp] Decrypted Payload '{"shell_port": 52748, "iopub_port": 43518, "stdin_port": 58806, "control_port": 42488, "hb_port": 50091, "ip": "0.0.0.0", "key": "afe8afab-0b1d-408e-9c69-d4812550fc4c", "transport": "tcp", "signature_scheme": "hmac-sha256", "kernel_name": "", "pid": "2617", "pgid": "2569", "comm_port": 34636}'
[D 2018-09-12 14:16:58.868 EnterpriseGatewayApp] Connect Info received from the launcher is as follows '{'shell_port': 52748, 'iopub_port': 43518, 'stdin_port': 58806, 'control_port': 42488, 'hb_port': 50091, 'ip': '0.0.0.0', 'key': 'afe8afab-0b1d-408e-9c69-d4812550fc4c', 'transport': 'tcp', 'signature_scheme': 'hmac-sha256', 'kernel_name': '', 'pid': '2617', 'pgid': '2569', 'comm_port': 34636}'
[D 2018-09-12 14:16:58.868 EnterpriseGatewayApp] Host assigned to the Kernel is: 'spark-worker-1.c.mozn-location.internal' '10.132.0.5'
[D 2018-09-12 14:16:58.868 EnterpriseGatewayApp] Established gateway communication to: 10.132.0.5:34636 for KernelID '915100ad-6520-416c-b01d-8d7f8dd73344'
[D 2018-09-12 14:16:58.868 EnterpriseGatewayApp] Updated pid to: 2617
[D 2018-09-12 14:16:58.868 EnterpriseGatewayApp] Updated pgid to: 2569
[D 2018-09-12 14:16:58.871 EnterpriseGatewayApp] Received connection info for KernelID '915100ad-6520-416c-b01d-8d7f8dd73344' from host 'spark-worker-1.c.mozn-location.internal': {'shell_port': 52748, 'iopub_port': 43518, 'stdin_port': 58806, 'control_port': 42488, 'hb_port': 50091, 'ip': '10.132.0.5', 'key': b'afe8afab-0b1d-408e-9c69-d4812550fc4c', 'transport': 'tcp', 'signature_scheme': 'hmac-sha256', 'kernel_name': '', 'comm_port': 34636}...
[D 2018-09-12 14:16:58.873 EnterpriseGatewayApp] Connecting to: tcp://10.132.0.5:42488
[D 2018-09-12 14:16:58.875 EnterpriseGatewayApp] Connecting to: tcp://10.132.0.5:43518
[I 2018-09-12 14:16:58.877 EnterpriseGatewayApp] Kernel started: 915100ad-6520-416c-b01d-8d7f8dd73344
[D 2018-09-12 14:16:58.877 EnterpriseGatewayApp] Kernel args: {'env': {'PATH': '/opt/anaconda3/bin:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.8.0_181-amd64/bin:/usr/java/jdk1.8.0_181-amd64/jre/bin:/opt/anaconda3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/elyra/.local/bin:/home/elyra/bin', 'KERNEL_USERNAME': 'elyra'}, 'kernel_name': 'spark_python_yarn_cluster'}
[I 2018-09-12 14:16:58.877 EnterpriseGatewayApp] Culling kernels with idle durations > 600 seconds at 30 second intervals ...
[I 180912 14:16:58 web:2106] 201 POST /api/kernels (127.0.0.1) 14617.29ms
[I 180912 14:16:59 web:2106] 200 GET /api/kernels/915100ad-6520-416c-b01d-8d7f8dd73344 (127.0.0.1) 1.46ms
[D 2018-09-12 14:16:59.330 EnterpriseGatewayApp] Initializing websocket connection /api/kernels/915100ad-6520-416c-b01d-8d7f8dd73344/channels
[W 2018-09-12 14:16:59.332 EnterpriseGatewayApp] No session ID specified
[D 2018-09-12 14:16:59.333 EnterpriseGatewayApp] Requesting kernel info from 915100ad-6520-416c-b01d-8d7f8dd73344
[D 2018-09-12 14:16:59.333 EnterpriseGatewayApp] Connecting to: tcp://10.132.0.5:52748
[D 2018-09-12 14:16:59.342 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:16:59.343 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:16:59.343 EnterpriseGatewayApp] Received kernel info: {'status': 'ok', 'protocol_version': '5.1', 'implementation': 'ipython', 'implementation_version': '6.4.0', 'language_info': {'name': 'python', 'version': '3.6.5', 'mimetype': 'text/x-python', 'codemirror_mode': {'name': 'ipython', 'version': 3}, 'pygments_lexer': 'ipython3', 'nbconvert_exporter': 'python', 'file_extension': '.py'}, 'banner': "Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) \nType 'copyright', 'credits' or 'license' for more information\nIPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.\n", 'help_links': [{'text': 'Python Reference', 'url': 'https://docs.python.org/3.6'}, {'text': 'IPython Reference', 'url': 'https://ipython.org/documentation.html'}, {'text': 'NumPy Reference', 'url': 'https://docs.scipy.org/doc/numpy/reference/'}, {'text': 'SciPy Reference', 'url': 'https://docs.scipy.org/doc/scipy/reference/'}, {'text': 'Matplotlib Reference', 'url': 'https://matplotlib.org/contents.html'}, {'text': 'SymPy Reference', 'url': 'http://docs.sympy.org/latest/index.html'}, {'text': 'pandas Reference', 'url': 'https://pandas.pydata.org/pandas-docs/stable/'}]}
[I 2018-09-12 14:16:59.344 EnterpriseGatewayApp] Adapting to protocol v5.1 for kernel 915100ad-6520-416c-b01d-8d7f8dd73344
[I 180912 14:16:59 web:2106] 101 GET /api/kernels/915100ad-6520-416c-b01d-8d7f8dd73344/channels (127.0.0.1) 15.46ms
[D 2018-09-12 14:16:59.345 EnterpriseGatewayApp] Opening websocket /api/kernels/915100ad-6520-416c-b01d-8d7f8dd73344/channels
[D 2018-09-12 14:16:59.345 EnterpriseGatewayApp] Getting buffer for 915100ad-6520-416c-b01d-8d7f8dd73344
[D 2018-09-12 14:16:59.345 EnterpriseGatewayApp] Connecting to: tcp://10.132.0.5:52748
[D 2018-09-12 14:16:59.346 EnterpriseGatewayApp] Connecting to: tcp://10.132.0.5:43518
[D 2018-09-12 14:16:59.346 EnterpriseGatewayApp] Connecting to: tcp://10.132.0.5:58806
[D 2018-09-12 14:16:59.437 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:16:59.437 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:16:59.442 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:16:59.443 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:16:59.568 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:16:59.571 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: execute_input
[D 2018-09-12 14:16:59.571 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: execute_result
[D 2018-09-12 14:16:59.574 EnterpriseGatewayApp] activity on 915100ad-6520-416c-b01d-8d7f8dd73344: status
[D 2018-09-12 14:17:28.895 EnterpriseGatewayApp] Polling every 30 seconds for kernels idle > 600 seconds...
[D 2018-09-12 14:17:28.895 EnterpriseGatewayApp] kernel_id=915100ad-6520-416c-b01d-8d7f8dd73344, kernel_name=spark_python_yarn_cluster, last_activity=2018-09-12 14:16:59.574277+00:00
[D 2018-09-12 14:17:58.895 EnterpriseGatewayApp] Polling every 30 seconds for kernels idle > 600 seconds...
[D 2018-09-12 14:17:58.895 EnterpriseGatewayApp] kernel_id=915100ad-6520-416c-b01d-8d7f8dd73344, kernel_name=spark_python_yarn_cluster, last_activity=2018-09-12 14:16:59.574277+00:00
[elyra@spark-master ~]$ yarn logs -applicationId application_1536672003321_0065
18/09/12 14:17:33 INFO client.RMProxy: Connecting to ResourceManager at spark-master.c.mozn-location.internal/10.132.0.4:8050
18/09/12 14:17:33 INFO client.AHSProxy: Connecting to Application History server at spark-master.c.mozn-location.internal/10.132.0.4:10200
Container: container_e06_1536672003321_0065_01_000001 on spark-worker-1.c.mozn-location.internal:45454
LogAggregationType: LOCAL
======================================================================================================
LogType:directory.info
LogLastModifiedTime:Wed Sep 12 14:16:54 +0000 2018
LogLength:34810
LogContents:
ls -l:
total 20
-rw-r--r-- 1 yarn hadoop 69 Sep 12 14:16 container_tokens
-rwx------ 1 yarn hadoop 654 Sep 12 14:16 default_container_executor_session.sh
-rwx------ 1 yarn hadoop 708 Sep 12 14:16 default_container_executor.sh
-rwx------ 1 yarn hadoop 6385 Sep 12 14:16 launch_container.sh
lrwxrwxrwx 1 yarn hadoop 68 Sep 12 14:16 launch_ipykernel.py -> /hadoop/yarn/local/usercache/elyra/filecache/399/launch_ipykernel.py
lrwxrwxrwx 1 yarn hadoop 68 Sep 12 14:16 py4j-0.10.6-src.zip -> /hadoop/yarn/local/usercache/elyra/filecache/396/py4j-0.10.6-src.zip
lrwxrwxrwx 1 yarn hadoop 60 Sep 12 14:16 pyspark.zip -> /hadoop/yarn/local/usercache/elyra/filecache/395/pyspark.zip
lrwxrwxrwx 1 yarn hadoop 67 Sep 12 14:16 __spark_conf__ -> /hadoop/yarn/local/usercache/elyra/filecache/398/__spark_conf__.zip
lrwxrwxrwx 1 yarn hadoop 86 Sep 12 14:16 __spark_libs__ -> /hadoop/yarn/local/usercache/elyra/filecache/397/__spark_libs__8613768646819207724.zip
drwx--x--- 2 yarn hadoop 6 Sep 12 14:16 tmp
find -L . -maxdepth 5 -ls:
402660736 4 drwx--x--- 3 yarn hadoop 4096 Sep 12 14:16 .
419437566 0 drwx--x--- 2 yarn hadoop 6 Sep 12 14:16 ./tmp
402660737 4 -rw-r--r-- 1 yarn hadoop 69 Sep 12 14:16 ./container_tokens
402660738 4 -rw-r--r-- 1 yarn hadoop 12 Sep 12 14:16 ./.container_tokens.crc
402660739 8 -rwx------ 1 yarn hadoop 6385 Sep 12 14:16 ./launch_container.sh
402660740 4 -rw-r--r-- 1 yarn hadoop 60 Sep 12 14:16 ./.launch_container.sh.crc
402660741 4 -rwx------ 1 yarn hadoop 654 Sep 12 14:16 ./default_container_executor_session.sh
402660742 4 -rw-r--r-- 1 yarn hadoop 16 Sep 12 14:16 ./.default_container_executor_session.sh.crc
402660743 4 -rwx------ 1 yarn hadoop 708 Sep 12 14:16 ./default_container_executor.sh
402660744 4 -rw-r--r-- 1 yarn hadoop 16 Sep 12 14:16 ./.default_container_executor.sh.crc
285231561 532 -r-x------ 1 yarn hadoop 541536 Sep 12 14:16 ./pyspark.zip
327202114 16 drwx------ 2 yarn hadoop 12288 Sep 12 14:16 ./__spark_libs__
327202115 176 -r-x------ 1 yarn hadoop 178947 Sep 12 14:16 ./__spark_libs__/hk2-api-2.4.0-b34.jar
327202116 20 -r-x------ 1 yarn hadoop 16993 Sep 12 14:16 ./__spark_libs__/JavaEWAH-0.3.2.jar
327202117 96 -r-x------ 1 yarn hadoop 96221 Sep 12 14:16 ./__spark_libs__/commons-pool-1.5.4.jar
327202118 200 -r-x------ 1 yarn hadoop 201928 Sep 12 14:16 ./__spark_libs__/RoaringBitmap-0.5.11.jar
327202119 180 -r-x------ 1 yarn hadoop 181271 Sep 12 14:16 ./__spark_libs__/hk2-locator-2.4.0-b34.jar
327202120 232 -r-x------ 1 yarn hadoop 236660 Sep 12 14:16 ./__spark_libs__/ST4-4.0.4.jar
327202121 80 -r-x------ 1 yarn hadoop 79845 Sep 12 14:16 ./__spark_libs__/compress-lzf-1.0.3.jar
327202122 68 -r-x------ 1 yarn hadoop 69409 Sep 12 14:16 ./__spark_libs__/activation-1.1.1.jar
327202123 164 -r-x------ 1 yarn hadoop 164422 Sep 12 14:16 ./__spark_libs__/core-1.1.2.jar
327202124 128 -r-x------ 1 yarn hadoop 130802 Sep 12 14:16 ./__spark_libs__/aircompressor-0.8.jar
327202125 120 -r-x------ 1 yarn hadoop 118973 Sep 12 14:16 ./__spark_libs__/hk2-utils-2.4.0-b34.jar
327202126 436 -r-x------ 1 yarn hadoop 445288 Sep 12 14:16 ./__spark_libs__/antlr-2.7.7.jar
327202127 68 -r-x------ 1 yarn hadoop 69500 Sep 12 14:16 ./__spark_libs__/curator-client-2.7.1.jar
327202128 164 -r-x------ 1 yarn hadoop 164368 Sep 12 14:16 ./__spark_libs__/antlr-runtime-3.4.jar
327202129 184 -r-x------ 1 yarn hadoop 186273 Sep 12 14:16 ./__spark_libs__/curator-framework-2.7.1.jar
327202130 328 -r-x------ 1 yarn hadoop 334662 Sep 12 14:16 ./__spark_libs__/antlr4-runtime-4.7.jar
.....
327201763 32 -r-x------ 1 yarn hadoop 30108 Sep 12 14:16 ./__spark_libs__/spark-sketch_2.11-2.3.0.2.6.5.0-292.jar
327201764 8500 -r-x------ 1 yarn hadoop 8701418 Sep 12 14:16 ./__spark_libs__/spark-sql_2.11-2.3.0.2.6.5.0-292.jar
327201765 2120 -r-x------ 1 yarn hadoop 2170500 Sep 12 14:16 ./__spark_libs__/spark-streaming_2.11-2.3.0.2.6.5.0-292.jar
377489393 16 -r-x------ 1 yarn hadoop 13867 Sep 12 14:16 ./launch_ipykernel.py
302030326 80 -r-x------ 1 yarn hadoop 80352 Sep 12 14:16 ./py4j-0.10.6-src.zip
352325558 0 drwx------ 3 yarn hadoop 145 Sep 12 14:16 ./__spark_conf__
352325561 4 -r-x------ 1 yarn hadoop 1240 Sep 12 14:16 ./__spark_conf__/log4j.properties
352325562 8 -r-x------ 1 yarn hadoop 4956 Sep 12 14:16 ./__spark_conf__/metrics.properties
360727180 4 drwx------ 2 yarn hadoop 4096 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__
360727181 4 -r-x------ 1 yarn hadoop 2359 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/topology_script.py.backup
360727182 8 -r-x------ 1 yarn hadoop 7024 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/mapred-site.xml
360727183 8 -r-x------ 1 yarn hadoop 6355 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/hadoop-env.sh
360727184 12 -r-x------ 1 yarn hadoop 10449 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/log4j.properties
360727185 4 -r-x------ 1 yarn hadoop 2509 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/hadoop-metrics2.properties
360727186 20 -r-x------ 1 yarn hadoop 19415 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/yarn-site.xml
360727187 4 -r-x------ 1 yarn hadoop 3979 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/hadoop-env.cmd
360727188 4 -r-x------ 1 yarn hadoop 1 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/yarn.exclude
360727189 4 -r-x------ 1 yarn hadoop 1 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/dfs.exclude
360727190 8 -r-x------ 1 yarn hadoop 4273 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/core-site.xml
360727191 4 -r-x------ 1 yarn hadoop 244 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/spark-thrift-fairscheduler.xml
360727192 4 -r-x------ 1 yarn hadoop 1631 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/kms-log4j.properties
360727193 4 -r-x------ 1 yarn hadoop 2250 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/yarn-env.cmd
360727194 4 -r-x------ 1 yarn hadoop 884 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/ssl-client.xml
360727195 4 -r-x------ 1 yarn hadoop 2035 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/capacity-scheduler.xml
360727196 4 -r-x------ 1 yarn hadoop 3518 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/kms-acls.xml
360727197 4 -r-x------ 1 yarn hadoop 2358 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/topology_script.py
360727198 4 -r-x------ 1 yarn hadoop 758 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/mapred-site.xml.template
360727199 4 -r-x------ 1 yarn hadoop 1335 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/configuration.xsl
360728640 8 -r-x------ 1 yarn hadoop 5327 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/yarn-env.sh
360728641 8 -r-x------ 1 yarn hadoop 6909 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/hdfs-site.xml
360728642 4 -r-x------ 1 yarn hadoop 2319 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/gcs-connector-key.json
360728643 4 -r-x------ 1 yarn hadoop 1020 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/commons-logging.properties
360728644 4 -r-x------ 1 yarn hadoop 1019 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/container-executor.cfg
360728645 8 -r-x------ 1 yarn hadoop 4221 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/task-log4j.properties
360728646 4 -r-x------ 1 yarn hadoop 2490 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/hadoop-metrics.properties
360728647 4 -r-x------ 1 yarn hadoop 818 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/mapred-env.sh
360728648 4 -r-x------ 1 yarn hadoop 1602 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/health_check
360728649 4 -r-x------ 1 yarn hadoop 752 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/hive-site.xml
360728650 4 -r-x------ 1 yarn hadoop 2316 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/ssl-client.xml.example
360728651 4 -r-x------ 1 yarn hadoop 1527 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/kms-env.sh
360728652 4 -r-x------ 1 yarn hadoop 1308 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/hadoop-policy.xml
360728653 4 -r-x------ 1 yarn hadoop 119 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/slaves
360728654 4 -r-x------ 1 yarn hadoop 254 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/topology_mappings.data
360728655 4 -r-x------ 1 yarn hadoop 1000 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/ssl-server.xml
360728656 4 -r-x------ 1 yarn hadoop 951 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/mapred-env.cmd
360728657 4 -r-x------ 1 yarn hadoop 2697 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/ssl-server.xml.example
360728658 4 -r-x------ 1 yarn hadoop 945 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/taskcontroller.cfg
360728659 8 -r-x------ 1 yarn hadoop 5511 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/kms-site.xml
360728660 8 -r-x------ 1 yarn hadoop 4113 Sep 12 14:16 ./__spark_conf__/__hadoop_conf__/mapred-queues.xml.template
352325563 124 -r-x------ 1 yarn hadoop 123087 Sep 12 14:16 ./__spark_conf__/__spark_hadoop_conf__.xml
352325564 4 -r-x------ 1 yarn hadoop 2575 Sep 12 14:16 ./__spark_conf__/__spark_conf__.properties
broken symlinks(find -L . -maxdepth 5 -type l -ls):
End of LogType:directory.info.This log file belongs to a running container (container_e06_1536672003321_0065_01_000001) and so may not be complete.
*******************************************************************************
End of LogType:prelaunch.err.This log file belongs to a running container (container_e06_1536672003321_0065_01_000001) and so may not be complete.
******************************************************************************
Container: container_e06_1536672003321_0065_01_000001 on spark-worker-1.c.mozn-location.internal:45454
LogAggregationType: LOCAL
======================================================================================================
LogType:stdout
LogLastModifiedTime:Wed Sep 12 14:16:58 +0000 2018
LogLength:1596
LogContents:
Using connection file '/tmp/kernel-915100ad-6520-416c-b01d-8d7f8dd73344_chknmm6h.json' instead of '/home/elyra/.local/share/jupyter/runtime/kernel-915100ad-6520-416c-b01d-8d7f8dd73344.json'
Signal socket bound to host: 0.0.0.0, port: 34636
JSON Payload 'b'{"shell_port": 52748, "iopub_port": 43518, "stdin_port": 58806, "control_port": 42488, "hb_port": 50091, "ip": "0.0.0.0", "key": "afe8afab-0b1d-408e-9c69-d4812550fc4c", "transport": "tcp", "signature_scheme": "hmac-sha256", "kernel_name": "", "pid": "2617", "pgid": "2569", "comm_port": 34636}'
Encrypted Payload 'b'xXseh4YIaBIjHK40EaJYxpeu0HoUetzbK7D9SGaZMbM7jCqE2Yk5ctbJsl9wlJQq/+JTW86mPhXQc3IDOcaGupugD141PZA5SNX4q/zOM/fjSQFzSAlc02fywPr3wW6TLp//ZSCJfJD5cWFX4I0y2xWJFwU7foalmKXREk52F+bgFWJ3cL5NxKML8GzaiEWRICPffVimPVG0b1UhgXyi+9ya64lFlJ9U+kpuOYqgEgkhmxstTlu/5f2u3w47CHomw1N4TqviMxM0RAiXZRfcyyIXpkF4JZzAS3ZucXaEuHDf++/XuZcdHl2Hz0ACoqF5T2/8pXKhk58l1tK81Pgjl0pcpWmXTtsaJrhHoz9FQpZ7qbLxKWd9Yt/cvrGTfpjNuxC1+olNqqMwsUMAKbjBrA=='
/opt/anaconda3/lib/python3.6/site-packages/IPython/paths.py:68: UserWarning: IPython parent '/home' is not a writable location, using a temp directory.
" using a temp directory.".format(parent))
NOTE: When using the `ipython kernel` entry point, Ctrl-C will not work.
To exit, you will have to explicitly quit this process, by either sending
"quit" from a client, or using Ctrl-\ in UNIX-like environments.
To read more about this, see https://github.com/ipython/ipython/issues/2049
To connect another client to this kernel, use:
--existing /tmp/kernel-915100ad-6520-416c-b01d-8d7f8dd73344_chknmm6h.json
End of LogType:stdout.This log file belongs to a running container (container_e06_1536672003321_0065_01_000001) and so may not be complete.
***********************************************************************
Container: container_e06_1536672003321_0065_01_000001 on spark-worker-1.c.mozn-location.internal:45454
LogAggregationType: LOCAL
======================================================================================================
LogType:stderr
LogLastModifiedTime:Wed Sep 12 14:16:57 +0000 2018
LogLength:1649
LogContents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/elyra/filecache/397/__spark_libs__8613768646819207724.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/12 14:16:55 INFO SignalUtils: Registered signal handler for TERM
18/09/12 14:16:55 INFO SignalUtils: Registered signal handler for HUP
18/09/12 14:16:55 INFO SignalUtils: Registered signal handler for INT
18/09/12 14:16:55 INFO SecurityManager: Changing view acls to: yarn,elyra
18/09/12 14:16:55 INFO SecurityManager: Changing modify acls to: yarn,elyra
18/09/12 14:16:55 INFO SecurityManager: Changing view acls groups to:
18/09/12 14:16:55 INFO SecurityManager: Changing modify acls groups to:
18/09/12 14:16:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, elyra); groups with view permissions: Set(); users with modify permissions: Set(yarn, elyra); groups with modify permissions: Set()
18/09/12 14:16:56 INFO ApplicationMaster: Preparing Local resources
18/09/12 14:16:57 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1536672003321_0065_000001
18/09/12 14:16:57 INFO ApplicationMaster: Starting the user application in a separate Thread
18/09/12 14:16:57 INFO ApplicationMaster: Waiting for spark context initialization...
End of LogType:stderr.This log file belongs to a running container (container_e06_1536672003321_0065_01_000001) and so may not be complete.
***********************************************************************
Container: container_e06_1536672003321_0065_01_000001 on spark-worker-1.c.mozn-location.internal:45454
LogAggregationType: LOCAL
======================================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Sep 12 14:16:54 +0000 2018
LogLength:100
LogContents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e06_1536672003321_0065_01_000001) and so may not be complete.
******************************************************************************
Container: container_e06_1536672003321_0065_01_000001 on spark-worker-1.c.mozn-location.internal:45454
LogAggregationType: LOCAL
======================================================================================================
LogType:launch_container.sh
LogLastModifiedTime:Wed Sep 12 14:16:54 +0000 2018
LogLength:6385
LogContents:
#!/bin/bash
set -o pipefail -e
export PRELAUNCH_OUT="/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/prelaunch.out"
exec >"${PRELAUNCH_OUT}"
export PRELAUNCH_ERR="/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/prelaunch.err"
exec 2>"${PRELAUNCH_ERR}"
echo "Setting up env variables"
export SPARK_YARN_STAGING_DIR="hdfs://spark-master.c.mozn-location.internal:8020/user/elyra/.sparkStaging/application_1536672003321_0065"
export PATH="/opt/anaconda3/bin/python:/opt/anaconda3/bin:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.8.0_181-amd64/bin:/usr/java/jdk1.8.0_181-amd64/jre/bin:/opt/anaconda3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/elyra/.local/bin:/home/elyra/bin"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/2.6.5.0-292/hadoop/conf"}
export MAX_APP_ATTEMPTS="2"
export JAVA_HOME=${JAVA_HOME:-"/usr/java/jdk1.8.0_181-amd64"}
export LANG="en_US.UTF-8"
export APP_SUBMIT_TIME_ENV="1536761812710"
export NM_HOST="spark-worker-1.c.mozn-location.internal"
export PYSPARK_PYTHON="/opt/anaconda3/bin/python"
export LOGNAME="elyra"
export JVM_PID="$$"
export PWD="/hadoop/yarn/local/usercache/elyra/appcache/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001"
export PYTHONHASHSEED="0"
export LOCAL_DIRS="/hadoop/yarn/local/usercache/elyra/appcache/application_1536672003321_0065"
export PYTHONPATH="/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip:$PWD/pyspark.zip:$PWD/py4j-0.10.6-src.zip"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1536672003321_0065"
export NM_HTTP_PORT="8042"
export LOG_DIRS="/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
"
export NM_PORT="45454"
export PYSPARK_GATEWAY_SECRET="thisjustblabalabala"
export USER="elyra"
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/2.6.5.0-292/hadoop-yarn"}
export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:/usr/hdp/2.6.5.0-292/hadoop/conf:/usr/hdp/2.6.5.0-292/hadoop/*:/usr/hdp/2.6.5.0-292/hadoop/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:/usr/hdp/current/ext/hadoop/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.6.5.0/hadoop/lib/hadoop-lzo-0.6.0.2.6.5.0.jar:/etc/hadoop/conf/secure:/usr/hdp/current/ext/hadoop/*:$PWD/__spark_conf__/__hadoop_conf__"
export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/local/usercache/elyra/appcache/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/container_tokens"
export NM_AUX_SERVICE_spark_shuffle=""
export SPARK_USER="elyra"
export LOCAL_USER_DIRS="/hadoop/yarn/local/usercache/elyra/"
export HADOOP_HOME="/usr/hdp/2.6.5.0-292/hadoop"
export PYTHONUSERBASE="/opt/anaconda3"
export HOME="/home/"
export NM_AUX_SERVICE_spark2_shuffle=""
export CONTAINER_ID="container_e06_1536672003321_0065_01_000001"
export MALLOC_ARENA_MAX="4"
echo "Setting up job resources"
ln -sf "/hadoop/yarn/local/usercache/elyra/filecache/395/pyspark.zip" "pyspark.zip"
ln -sf "/hadoop/yarn/local/usercache/elyra/filecache/397/__spark_libs__8613768646819207724.zip" "__spark_libs__"
ln -sf "/hadoop/yarn/local/usercache/elyra/filecache/399/launch_ipykernel.py" "launch_ipykernel.py"
ln -sf "/hadoop/yarn/local/usercache/elyra/filecache/396/py4j-0.10.6-src.zip" "py4j-0.10.6-src.zip"
ln -sf "/hadoop/yarn/local/usercache/elyra/filecache/398/__spark_conf__.zip" "__spark_conf__"
echo "Copying debugging information"
# Creating copy of launch script
cp "launch_container.sh" "/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/launch_container.sh"
chmod 640 "/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/directory.info"
ls -l 1>>"/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/directory.info"
echo "Launching container"
exec /bin/bash -c "LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH" $JAVA_HOME/bin/java -server -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dhdp.version=2.6.5.0 -Dspark.yarn.app.container.log.dir=/hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.deploy.PythonRunner' --primary-py-file launch_ipykernel.py --arg '/home/elyra/.local/share/jupyter/runtime/kernel-915100ad-6520-416c-b01d-8d7f8dd73344.json' --arg '--RemoteProcessProxy.response-address' --arg '10.132.0.4:56820' --arg '--RemoteProcessProxy.port-range' --arg '0..0' --arg '--RemoteProcessProxy.spark-context-initialization-mode' --arg 'lazy' --properties-file $PWD/__spark_conf__/__spark_conf__.properties 1> /hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/stdout 2> /hadoop/yarn/log/application_1536672003321_0065/container_e06_1536672003321_0065_01_000001/stderr"
End of LogType:launch_container.sh.This log file belongs to a running container (container_e06_1536672003321_0065_01_000001) and so may not be complete.
************************************************************************************
@kevin-bates both exists within the logging:
ignal socket bound to host: 0.0.0.0, port: 34636 JSON Payload 'b'{"shell_port": 52748, "iopub_port": 43518, "stdin_port": 58806, "control_port": 42488, "hb_port": 50091, "ip": "0.0.0.0", "key": "afe8afab-0b1d-408e-9c69-d4812550fc4c", "transport": "tcp", "signature_scheme": "hmac-sha256", "kernel_name": "", "pid": "2617", "pgid": "2569", "comm_port": 34636}'
377489393 16 -r-x------ 1 yarn hadoop 13867 Sep 12 14:16 ./launch_ipykernel.py
Finally the spark context seems to be in waiting state:
I agree with the log stack i got last it was mistake from my side playing with some configuration related PYSPARK_DRIVER_PYTHON=ipython
.
@ziedbouf Thanks for your patience working through these issues. I have the suspicion that you are trying to use HDP which only supports Python 2.7x with Anaconda 3.6 and Py4j from this anaconda env. I would recommend you to try a vanilla environment that only used HDP environment as we describe here which will use PySpark and Py4j from the HDP distribution at least to rule out the Python mismatch or Py4j incompatibility.
You're extremely close. So the spark context never finishes initialization? Are you able to run a pyspark app outside of EG?
At any rate, let me explain how kernel launching works. We (EG) essentially leverage the base framework so it helps to know, in general, how kernel launches work.
When jupyter gets a request to start a kernel, the user provides the name of the kernelspec directory that will be used. In there, jupyter expects to find a kernel.json
file that contains, at a minimum 'display_name' and argv
entries. This display name
is what is used by Notebook in the kernels list. argv
is essentially the command that is run. You'll notice that it takes a connection file name. (Btw, the curly-braced values in the argv are substitutions that are filled by Jupyter and EG.) For remote kernels, we also provide a response-address parameter as well as potential port-range specifications. The important one is the response-address.
Prior to invoking the command specified by argv
, the jupyter framework will add each entry in the env
stanza to the environment which will be in place when the argv
command is performed. EG will also add any KERNEL_
values to the env as well - the main ones being KERNEL_ID
and KERNEL_USERNAME
.
Spark-based kernel launches typically require more massaging and parameter setup, so they use a run.sh
script which does that kind of stuff. The thing to point out in run.sh
is that the kernel launcher script (launch_ipykernel.py
) is what is passed to spark-submit
. This is because we want the launcher to be the kernel process. Since the launcher embeds the target kernel (which is why launchers are typically written in the same language as the kernel), it can perform communication with the kernel itself. This is how we can get away with not requiring kernel updates in order for EG to use a given kernel. The launcher, when started, creates 5 local ports and constructs the equivalent of a connection file. It also constructs a 6th port that it listens on for out-of-band commands from EG. This information is then returned in the response-address, where EG is listening to receive that information. You're logs indicate all that is happening fine.
Once the kernel has started, communication occurs between EG and the kernel directly, except in the cases for kernel interrupts, restarts, or shutdown. In those cases, EG sends a message to the 6th port that is listened to by the launcher. The launcher then performs the action (interrupt or shutdown) by signalling the embedded kernel thread directly.
Another thing the launcher does, based on the --RemoteProcessProxy.spark-context-initialization-mode
parameter is create a spark context (or not if value is 'none'). This typically takes a few seconds.
So, we essentially leverage the complete framework for launching kernels. Where we diverge is that EG recognizes the process_proxy
stanza in the kernelspec. When it starts the kernel, it will use an instance of the process-proxy class to control the lifecycle of the kernel - which essentially abstracts the process member variable used by the framework. This is how we can support various resource managers in a pluggable way.
We also add the various RemoteProcessProxy
parameters to help facilitate remote behavior and other enterprise kinds of things - like port-range restrictions, etc.
I would recommend taking a look at the System Architecture section of the docs.
Thanks @kevin-bates it start making some sense for me. So just to answer your question regarding running pyspark outside EG, yes i am using zeppelin in parallel and it's working fine with the same python path /opt/anaconda3/bin/python
.
@lresende i agree on this, as i made some modification to get things run on python3 starting from fixing the topology_script.py.
In the case i go with python2, do you think that the following kernel.json
is fine:
[elyra@spark-master ~]$ cat /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/kernel.json
{
"language": "python",
"display_name": "Spark - Python (YARN Cluster Mode)",
"process_proxy": {
"class_name": "enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy"
},
"env": {
"SPARK_HOME": "/usr/hdp/current/spark2-client",
"PYSPARK_PYTHON": "/opt/anaconda3/bin/python",
"PYTHONPATH": "/lib/python2.7/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip",
"SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.am.waitTime=1d --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=thisjustblabalabala --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/lib/python2.7/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH= /opt/anaconda3/bin/python:$PATH",
"LAUNCH_OPTS": ""
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh",
"{connection_file}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
Also, does this mean that our script should be written in python2 or i can pass variable to specify which execution env to use similar to the one with zeppelin zeppelin.pyspark.python = /opt/anaconda3/bin/python
?
@ziedbouf I would still modify to use anaconda 2 "PYSPARK_PYTHON": "/opt/anaconda2/bin/python",
In this case i should go through installation of anaconda2, as i didn't install it in the first place. Will go through, just one more questions i still run python3 on my notebooks i must write everything in py3 instead of py2? in cases yes, which environment variable do you advise to configure?
I want to start adding customizations from a working environment, and in HDP, which is based on Python 2.x, the vanilla configuration should work. After that, we can start introducing customizations, such as adding anaconda 3, and validate that it still works. The issue might end up to be a limitation from HDP which in this case will be having a python version mismatch.
Sorry i close the issue by mistake, so @lresende first run using the default configuration with python2, -kernel.json:
{
"language": "python",
"display_name": "Spark - Python (YARN Cluster Mode)",
"process_proxy": {
"class_name": "enterprise_gateway.services.processproxies.yarn.YarnClusterProcessProxy"
},
"env": {
"SPARK_HOME": "/usr/hdp/current/spark2-client",
"PYSPARK_PYTHON": "/opt/anaconda2/bin/python",
"PYTHONPATH": "/opt/anaconda2/lib/python2.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip",
"SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/yarn/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda2/lib/python2.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda2/bin:$PATH",
"LAUNCH_OPTS": ""
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/bin/run.sh",
"{connection_file}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
i want to note that the only difference between the default kernel.json
and the one i use it the following:
Default:
"PYTHONPATH": "${HOME}/.local/lib/python2.7/
spark.yarn.appMasterEnv.PYTHONPATH=${HOME}/.local/lib/python2.7/site-packages
Local:
"PYTHONPATH": "/opt/anaconda2/lib/python2.7/site-packages
spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda2/lib/python2.7/site-packages:
As i was expecting this raise an error related to PYSPARK_GATEWAY_SECRET
,
Using connection file '/tmp/kernel-73c700f8-a03d-45ad-aa04-c96fa82ef9c6_5xFFnQ.json' instead of '/home/elyra/.local/share/jupyter/runtime/kernel-73c700f8-a03d-45ad-aa04-c96fa82ef9c6.json'
Signal socket bound to host: 0.0.0.0, port: 53516
JSON Payload '{"stdin_port": 37306, "pgid": "4536", "ip": "0.0.0.0", "pid": "4583", "control_port": 39802, "hb_port": 39581, "signature_scheme": "hmac-sha256", "key": "394b7d07-2c3d-4e71-9da2-9175a659cf1c", "comm_port": 53516, "kernel_name": "", "shell_port": 58451, "transport": "tcp", "iopub_port": 59037}
Encrypted Payload '0eCmaI4Jmz1vuY6hPnLL5MDVzOWRkqVGR641cblKs6jIv2CNxx5eylkXns0wi3kwkjDnJ+gpdEGZLlnwmvYHZqXXOntHFXoRA2LFDSjBTdJF0RVAriSdrwrtf6jsGE4/Og78+fAJhcHAd8u1zYKpNsblGdq5e4yaYAwxZaqrYICn6k73sqEAqEi7TzrVjmwrKpRGoIh3UmA0RIKxS2o+wCusJ9fcXf0/zKbB7wl5oNTizydqDR1F2OlRjZsdBAjI6q1wJ2DJf3UVj+3vPx97vIaelrIildHlK7xEb3vwYIekgV4GRNSFKtyMsL5PZhnQNdIR65mWfTjy5mzEkwyWV9Vec7Cr8/e7KHL3r7uB5Yccn3KNZElD9Rrq4k+Z+1nRoBSyrZ41ty4lniCh5G2fug==
/opt/anaconda2/lib/python2.7/site-packages/IPython/paths.py:69: UserWarning: IPython parent '/home' is not a writable location, using a temp directory.
" using a temp directory.".format(parent))
Exception in thread Thread-1:
Traceback (most recent call last):
File "/opt/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/opt/anaconda2/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "launch_ipykernel.py", line 62, in initialize_spark_session
spark = SparkSession.builder.getOrCreate()
File "/opt/anaconda2/lib/python2.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/opt/anaconda2/lib/python2.7/site-packages/pyspark/context.py", line 343, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/opt/anaconda2/lib/python2.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/opt/anaconda2/lib/python2.7/site-packages/pyspark/context.py", line 292, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/opt/anaconda2/lib/python2.7/site-packages/pyspark/java_gateway.py", line 47, in launch_gateway
gateway_secret = os.environ["PYSPARK_GATEWAY_SECRET"]
File "/opt/anaconda2/lib/python2.7/UserDict.py", line 40, in __getitem__
raise KeyError(key)
KeyError: 'PYSPARK_GATEWAY_SECRET'
As per @kevin-bates recommendation, i attached the following appEnv variable to the kernel:
--conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=just_a_secret_key
No error shows up in the error log of the app as an error but i got the following:
Given that we've never had to setup any of this PYSPARK_GATEWAY_
stuff, try not setting PYSPARK_GATEWAY_PORT
or PYSPARK_GATEWAY_SECRET
. Absence of PYSPARK_GATEWAY_PORT
takes a different branch when launching the java_gateway - one that we must be taking in our environments.
Could you then attach the Enterprise Gateway log (from startup thru issue) and the YARN application logs. If tar/zip is easier, that's fine. Also, please include the kernel.json and run.sh files used. I know you've posted, but its easier to "touch" the files. (I'm sure you know what I mean.)
Thanks.
@kevin-bates please find the logs as requested including yarn log, EG logs and and the kernel configuration.
@ziedbouf - thank you so much for the complete set of files - its extremely helpful to see the entire picture.
Enterprise Gateway is working completely as expected and this confirms its purely a spark context creation issue. What is confusing to me is why you're encountering this and we have not ever seen this. I can't determine (due to lack of Spark knowledge) whether this "java gateway" is always "in play" for python sessions - I suspect it is. If it is, and given we do not ever deal with PYSPARK_GATEWAY_SECRET
, then that would imply we do not have PYSPARK_GATEWATE_PORT
in the env of the spark-submit per my previous post.
I'm wondering if it might be helpful to add print(os.environ)
to the launch_ipykernel.py
script just prior to creating the context. This output should go to the stdout
file in the YARN logs and may help us better determine what is going on.
Its also a bit odd that we don't see any log messages in the EG log regarding the kernel's death. There should be some attempts at auto-restart since the exception should terminate the launcher. Hmm, actually, it's terminating the thread, but I believe the kernel would still be running, sans a spark context. So there's something funky there, but that's a complete side affect of the issue at hand - why the heck we can't start a spark context.
Have you tried running spark submit directly? You might need to massage some of the following...
exec /usr/hdp/current/spark2-client/bin/spark-submit --master yarn --deploy-mode cluster --name 471785be-cdf8-47d0-82b1-a134063ecc09 --conf spark.yarn.am.waitTime=1d --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:/opt/anaconda3/bin:/usr/lib64/qt-3.3/bin:/usr/java/jdk1.8.0_181-amd64/bin:/usr/java/jdk1.8.0_181-amd64/jre/bin:/opt/anaconda3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/elyra/.local/bin:/home/elyra/bin /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py /home/elyra/.local/share/jupyter/runtime/kernel-471785be-cdf8-47d0-82b1-a134063ecc09.json --RemoteProcessProxy.response-address 10.132.0.4:56333 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy```
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 1
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:102)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
With no logs to be accessed, but following this article from hortonworks i was been able to get some logs :/
Log Type: stdout
Log Upload Time: Fri Sep 14 17:22:05 +0000 2018
Log Length: 520
Using connection file '/tmp/kernel-471785be-cdf8-47d0-82b1-a134063ecc09_2gqh1c9o.json' instead of '/home/elyra/.local/share/jupyter/runtime/kernel-471785be-cdf8-47d0-82b1-a134063ecc09.json'
Signal socket bound to host: 0.0.0.0, port: 44190
Traceback (most recent call last):
File "launch_ipykernel.py", line 320, in <module>
lower_port, upper_port)
File "launch_ipykernel.py", line 143, in return_connection_info
s.connect((response_ip, response_port))
ConnectionRefusedError: [Errno 111] Connection refused
Log Type: prelaunch.err
Log Upload Time: Fri Sep 14 17:22:01 +0000 2018
Log Length: 0
Log Type: prelaunch.out
Log Upload Time: Fri Sep 14 17:22:01 +0000 2018
Log Length: 100
Setting up env variables
Setting up job resources
Copying debugging information
Launching container
Log Type: stderr
Log Upload Time: Fri Sep 14 17:22:05 +0000 2018
Log Length: 3899
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/67/__spark_libs__4683774330004883086.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/14 17:22:02 INFO SignalUtils: Registered signal handler for TERM
18/09/14 17:22:02 INFO SignalUtils: Registered signal handler for HUP
18/09/14 17:22:02 INFO SignalUtils: Registered signal handler for INT
18/09/14 17:22:02 INFO SecurityManager: Changing view acls to: yarn,root
18/09/14 17:22:02 INFO SecurityManager: Changing modify acls to: yarn,root
18/09/14 17:22:02 INFO SecurityManager: Changing view acls groups to:
18/09/14 17:22:02 INFO SecurityManager: Changing modify acls groups to:
18/09/14 17:22:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set()
18/09/14 17:22:03 INFO ApplicationMaster: Preparing Local resources
18/09/14 17:22:04 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1536934368460_0022_000001
18/09/14 17:22:04 INFO ApplicationMaster: Starting the user application in a separate Thread
18/09/14 17:22:04 INFO ApplicationMaster: Waiting for spark context initialization...
18/09/14 17:22:05 ERROR ApplicationMaster: User application exited with status 1
18/09/14 17:22:05 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: User application exited with status 1)
18/09/14 17:22:05 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited with 1
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:102)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
18/09/14 17:22:05 INFO ShutdownHookManager: Shutdown hook called
It's seems that somehow i might need to increase it more?
Increase what? The launch timeout? if so, no, 100 should be plenty. We rarely see more than 30.
What did you change? This implies you're going backwards ...
Signal socket bound to host: 0.0.0.0, port: 44190
Traceback (most recent call last):
File "launch_ipykernel.py", line 320, in <module>
lower_port, upper_port)
File "launch_ipykernel.py", line 143, in return_connection_info
s.connect((response_ip, response_port))
ConnectionRefusedError: [Errno 111] Connection refused
since you were able to return the connection info previously...
Signal socket bound to host: 0.0.0.0, port: 53516
JSON Payload '{"stdin_port": 37306, "pgid": "4536", "ip": "0.0.0.0", "pid": "4583", "control_port": 39802, "hb_port": 39581, "signature_scheme": "hmac-sha256", "key": "394b7d07-2c3d-4e71-9da2-9175a659cf1c", "comm_port": 53516, "kernel_name": "", "shell_port": 58451, "transport": "tcp", "iopub_port": 59037}
Encrypted Payload '0eCmaI4Jmz1vuY6hPnLL5MDVzOWRkqVGR641cblKs6jIv2CNxx5eylkXns0wi3kwkjDnJ+gpdEGZLlnwmvYHZqXXOntHFXoRA2LFDSjBTdJF0RVAriSdrwrtf6jsGE4/Og78+fAJhcHAd8u1zYKpNsblGdq5e4yaYAwxZaqrYICn6k73sqEAqEi7TzrVjmwrKpRGoIh3UmA0RIKxS2o+wCusJ9fcXf0/zKbB7wl5oNTizydqDR1F2OlRjZsdBAjI6q1wJ2DJf3UVj+3vPx97vIaelrIildHlK7xEb3vwYIekgV4GRNSFKtyMsL5PZhnQNdIR65mWfTjy5mzEkwyWV9Vec7Cr8/e7KHL3r7uB5Yccn3KNZElD9Rrq4k+Z+1nRoBSyrZ41ty4lniCh5G2fug==
/opt/anaconda2/lib/python2.7/site-packages/IPython/paths.py:69: UserWarning: IPython parent '/home' is not a writable location, using a temp directory.
" using a temp directory.".format(parent))
and now you're probably getting a timeout-based exception in the EG log because it never got the information regarding the 5 ports.
hi @kevin-bates sorry for the latest reply, i misunderstood your comment last time as i killed the jupyter kernel and tried to run the exec command
.
Anyway it didn't work as i the job got killed and nothing really showing up to understand why things get stuck, so i listen to @lresende and re-run everything on fresh cluster using the ansible playbook and things works fine notes that i am using python2 instead of python3, so most likely that's compatibility issue,
Thanks for the support @kevin-bates and @lresende, just one more questions which configuration to go with in order to setup py3 instead of py2:
import platform
platform.python_version()
While browsing another mailing list, it seems that the PYSPARK_GATEWAY_SECRET is a new change on Spark related to CVE-2018-1334. We might need to update EG code to support the change on recent Spark.
Thanks @lresende, Can i help on this, any advises from where i can start as i am still exploring the overall stack.
Also, one more things the py4j changes. This is mean that it can be better to generate the kernels based on the the current version of $SPARK_HOME/python/lib
@lresende @ziedbouf - where do we stand on this issue? It looks like python 2 worked, but python 3 doesn't. Not sure how the PYSPARK_GATEWAY_SECRET stuff comes into play at this point since python 2 worked.
Just doing some housekeeping and would like to know if this issue can be closed.
@kevin-bates that was an issue of using python 3 instead of python 2 with yarn and due to some shift on the infrastructure didn't have the chance to tackle the issue in more deeper fashion.
Also something i miss from yarn documentation in general, there is no clear documentation related to attaching kernels to yarn execution environment which lead to a lot of confusion during the debugging/Testing phase.
@ziedbouf - thanks for the update. Regarding...
Also something i miss from yarn documentation in general, there is no clear documentation related to attaching kernels to yarn execution environment which lead to a lot of confusion during the debugging/Testing phase.
Are you speaking of the YARN portion of the Enterprise Gateway documentation or the Hadoop YARN documentation itself? If the Enterprise Gateway documentation could you please provide more details, or perhaps even a pull request containing the appropriate changes?
Thanks.
@kevin-bates i mean the yarn documentation is not as trival as i am expecting on the process of how to attach to notebooks to yarn clusters.
The Hadoop YARN docs shouldn't be covering anything regarding Notebooks. However, the YARN portion of the EG documentation may be missing something. I'm trying to understand what that might be so we might be able to fix our docs if their lacking information.
@ziedbouf could you please provide a link to the yarn documentation you're referring to. I'd like to better understand where the disconnect is. Thanks.
@kevin-bates i mean the yarn documentation (not related to jupyter-entreprise gateway) in general does not include any details related to the kernel integration.
I think most of the resources out there doesn't outline how jupyter kernels works in general and how we could create one. The following post seems to be a good start to grasp the idea of kernels in jupyter environement and I think @kevin-bates that it might be useful to create a series of blog posts part of jupyter entreprise gateway initiative to explain how kernels works in general.
@ziedbouf - thank you for the clarification. There is no reason the YARN documentation should mention anything about jupyter kernels. The launching of jupyter kernels using YARN as a resource manager is completely unique to what Enterprise Gateway enables. As far as YARN is concerned, the kernel is just another application.
The enabling technology for running remote kernels launched from EG is the Kernel Launchers covered in the System Architecture section of the docs. (The reason I don't include a link to the Launchers section is because there are a number of changes pending to that section in PR #534 - and a link to the existing docs will break once merged.)
I agree that enhancing our docs (along with a blog post) to include the items necessary for creating a launcher would be helpful.
Are you looking to add support for another kernel type other than Python, R or Scala? If so, you might be a good candidate for documenting your experience. :smiley:
This issue has evolved into something completely different from initializing the spark context and I'm inclined to close this issue unless anyone objects.
I am setting up the entreprise jupyter gateway but the initialization of spark context leads to erros for both client and cluster mode:
PYSPARK_GATEWAY_SECRET
none:Also this seems to lead to the following error in the second run:
PYTHON_WORKER_FACTORY_SECRET
is none:I am trying to set this variable through either export or directly on the kernel side:
For the client mode, do you think this is linked to the following get os environment variable for PYTHON_WORKER_FACTORY_SECRET and java ports
Regarding the cluster mode, my understanding is that spark
PythonRunner
will auto initialize the gateway secret key that will be used by java_ gateway.am I missing some configuration?