Closed dnemi closed 5 years ago
Here is the relevant error from the log output above:
java.net.URISyntaxException: Illegal character in hostname at index 21: s3://personalize-data-[ACCOUNT_ID]/raw-events
The cause of the issue in this case is that the S3_JSON_INPUT_PATH
and S3_CSV_OUTPUT_PATH
job parameters are not correct. You need to substitute your actual AWS account ID (without hyphens) where "[ACCOUNT_ID]
" is specified in the sample paths from the documentation.
--S3_JSON_INPUT_PATH s3://personalize-data-[ACCOUNT_ID]/raw-events/
--S3_CSV_OUTPUT_PATH s3://personalize-data-[ACCOUNT_ID]/transformed
This just so happens to be the bucket naming standard used for the workshop. If testing the exercise outside of a managed workshop, substitute the name of the bucket as necessary.
-conf spark.hadoop.yarn.resourcemanager.connect.max-wait.ms=60000 --conf spark.hadoop.fs.defaultFS=hdfs://ip-172-32-34-176.ec2.internal:8020 --conf spark.hadoop.yarn.resourcemanager.address=ip-172-32-34-176.ec2.internal:8032 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.minExecutors=1 --conf spark.dynamicAllocation.maxExecutors=18 --conf spark.executor.memory=5g --conf spark.executor.cores=4 --conf spark.driver.memory=5g --JOB_ID j_53156865b026edbb62f5b002460ae275f81d253c15ec4bdce9344f6b28230dae --JOB_RUN_ID jr_2a1f391ca6c93cf2e0c59260be40cb53c0f9d9a16754097503aaf61e97f1ab08 --scriptLocation s3://aws-glue-scripts-537632985422-us-east-1/admin/SegmentEventsJsonToCsv --job-bookmark-option job-bookmark-disable --S3_CSV_OUTPUT_PATH s3://personalize-data-[ACCOUNT_ID]/transformed --job-language python --TempDir s3://aws-glue-temporary-537632985422-us-east-1/admin --S3_JSON_INPUT_PATH s3://personalize-data-[ACCOUNT_ID]/raw-events/ --JOB_NAME SegmentEventsJsonToCsv Detected region us-east-1 Detected glue endpoint https://glue.us-east-1.amazonaws.com YARN_RM_DNS=ip-172-32-34-176.ec2.internal JOB_NAME = SegmentEventsJsonToCsv Specifying us-east-1 while copying script. Completed 2.6 KiB/2.6 KiB (44.8 KiB/s) with 1 file(s) remaining download: s3://aws-glue-scripts-537632985422-us-east-1/admin/SegmentEventsJsonToCsv to ./script_2019-02-27-18-58-26.py SCRIPT_URL = /tmp/g-27272b67599361810bc7b488f8994dab5d0c7cdc-460412413296521275/script_2019-02-27-18-58-26.py /usr/lib/spark/bin/spark-submit --conf spark.hadoop.yarn.resourcemanager.connect.max-wait.ms=60000 --conf spark.hadoop.fs.defaultFS=hdfs://ip-172-32-34-176.ec2.internal:8020 --conf spark.hadoop.yarn.resourcemanager.address=ip-172-32-34-176.ec2.internal:8032 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.minExecutors=1 --conf spark.dynamicAllocation.maxExecutors=18 --conf spark.executor.memory=5g --conf spark.executor.cores=4 --conf spark.driver.memory=5g --name tape --master yarn --deploy-mode cluster --jars /opt/amazon/superjar/glue-assembly.jar --files /tmp/glue-default.conf,/tmp/glue-override.conf,/opt/amazon/certs/ExternalAndAWSTrustStore.jks,/opt/amazon/certs/rds-combined-ca-bundle.pem,/opt/amazon/certs/redshift-ssl-ca-cert.pem,/opt/amazon/certs/RDSTrustStore.jks,/tmp/image-creation-time,,/tmp/g-27272b67599361810bc7b488f8994dab5d0c7cdc-460412413296521275/script_2019-02-27-18-58-26.py --py-files /tmp/PyGlue.zip /tmp/runscript.py script_2019-02-27-18-58-26.py --JOB_NAME SegmentEventsJsonToCsv --JOB_ID j_53156865b026edbb62f5b002460ae275f81d253c15ec4bdce9344f6b28230dae --JOB_RUN_ID jr_2a1f391ca6c93cf2e0c59260be40cb53c0f9d9a16754097503aaf61e97f1ab08 --job-bookmark-option job-bookmark-disable --S3_CSV_OUTPUT_PATH s3://personalize-data-[ACCOUNT_ID]/transformed --TempDir s3://aws-glue-temporary-537632985422-us-east-1/admin --S3_JSON_INPUT_PATH s3://personalize-data-[ACCOUNT_ID]/raw-events/ 19/02/27 18:58:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/02/27 18:58:48 INFO RMProxy: Connecting to ResourceManager at ip-172-32-34-176.ec2.internal/172.32.34.176:8032 19/02/27 18:58:48 INFO Client: Requesting a new application from cluster with 9 NodeManagers 19/02/27 18:58:48 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container) 19/02/27 18:58:48 INFO Client: Will allocate AM container, with 5632 MB memory including 512 MB overhead 19/02/27 18:58:48 INFO Client: Setting up container launch context for our AM 19/02/27 18:58:48 INFO Client: Setting up the launch environment for our AM container 19/02/27 18:58:48 INFO Client: Preparing resources for our AM container 19/02/27 18:58:50 DEBUG Client: 19/02/27 18:58:50 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 19/02/27 18:58:59 INFO Client: Uploading resource file:/tmp/spark-f7a5598e-c04d-4ffd-ab39-33211f47c8cd/spark_libs441647360030086276.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/spark_libs441647360030086276.zip 19/02/27 18:59:02 INFO Client: Uploading resource file:/opt/amazon/superjar/glue-assembly.jar -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/glue-assembly.jar 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/glue-default.conf -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/glue-default.conf 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/glue-override.conf -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/glue-override.conf 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/ExternalAndAWSTrustStore.jks -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/ExternalAndAWSTrustStore.jks 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/rds-combined-ca-bundle.pem -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/rds-combined-ca-bundle.pem 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/redshift-ssl-ca-cert.pem -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/redshift-ssl-ca-cert.pem 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/RDSTrustStore.jks -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/RDSTrustStore.jks 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/image-creation-time -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/image-creation-time 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/g-27272b67599361810bc7b488f8994dab5d0c7cdc-460412413296521275/script_2019-02-27-18-58-26.py -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/script_2019-02-27-18-58-26.py 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/runscript.py -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/runscript.py 19/02/27 18:59:26 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/pyspark.zip 19/02/27 18:59:26 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/py4j-0.10.4-src.zip 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/PyGlue.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/PyGlue.zip 19/02/27 18:59:26 INFO Clien t: Uploading resource file:/tmp/spark-f7a5598e-c04d-4ffd-ab39-33211f47c8cd/spark_conf7256533821855478433.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/spark_conf.zip 19/02/27 18:59:26 DEBUG Client: =============================================================================== 19/02/27 18:59:26 DEBUG Client: YARN AM launch context: 19/02/27 18:59:26 DEBUG Client: user class: org.apache.spark.deploy.PythonRunner 19/02/27 18:59:26 DEBUG Client: env: 19/02/27 18:59:26 DEBUG Client: CLASSPATH -> ./:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/glue/etl/jars/aws-glue-datacatalog-spark-client-1.8.0-SNAPSHOT.jar{{PWD}}{{PWD}}/spark_conf{{PWD}}/ spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/share/hadoop/common/ $HADOOP_COMMON_HOME/share/hadoop/common/lib/$HADOOP_HDFS_HOME/share/hadoop/hdfs/ $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/$HADOOP_YARN_HOME/share/hadoop/yarn/ $HADOOP_YARN_HOME/share/hadoop/yarn/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/ $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
19/02/27 18:59:26 DEBUG Client: SPARK_YARN_STAGING_DIR -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001
19/02/27 18:59:26 DEBUG Client: SPARK_USER -> root
19/02/27 18:59:26 DEBUG Client: SPARK_YARN_MODE -> true
19/02/27 18:59:26 DEBUG Client: PYTHONHASHSEED -> 0
19/02/27 18:59:26 DEBUG Client: PYTHONPATH -> {{PWD}}/pyspark.zip{{PWD}}/py4j-0.10.4-src.zip{{PWD}}/PyGlue.zip
19/02/27 18:59:26 DEBUG Client: resources:
19/02/27 18:59:27 DEBUG Client: image-creation-time -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/image-creation-time" } size: 11 timestamp: 1551293966749 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: py4j-0.10.4-src.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/py4j-0.10.4-src.zip" } size: 74096 timestamp: 1551293966895 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: glue-assembly.jar -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-assembly.jar" } size: 423322980 timestamp: 1551293966507 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: pyspark.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/pyspark.zip" } size: 482687 timestamp: 1551293966848 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: spark_libs -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/spark_libs441647360030086276.zip" } size: 218234389 timestamp: 1551293942463 type: ARCHIVE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: redshift-ssl-ca-cert.pem -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/redshift-ssl-ca-cert.pem" } size: 8621 timestamp: 1551293966667 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: rds-combined-ca-bundle.pem -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/rds-combined-ca-bundle.pem" } size: 31848 timestamp: 1551293966642 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: glue-default.conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-default.conf" } size: 382 timestamp: 1551293966545 ty
pe: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: runscript.py -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/runscript.py" } size: 3549 timestamp: 1551293966794 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: glue-override.conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-override.conf" } size: 264 timestamp: 1551293966568 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: ExternalAndAWSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/ExternalAndAWSTrustStore.jks" } size: 118406 timestamp: 1551293966618 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: PyGlue.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/PyGlue.zip" } size: 104304 timestamp: 1551293966944 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: spark_conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/spark_conf.zip" } size: 8098 timestamp: 1551293966984 type: ARCHIVE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: script_2019-02-27-18-58-26.py -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/script_2019-02-27-18-58-26.py" } size: 2646 timestamp: 1551293966772 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: RDSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/RDSTrustStore.jks" } size: 19135 timestamp: 1551293966702 type: FILE visibility: PRIVATE
19/02/27 18:59:27 DEBUG Client: command:
19/02/27 18:59:27 DEBUG Client: LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:$LD_LIBRARY_PATH" {{JAVA_HOME}}/bin/java -server -Xmx5120m -Djava.io.tmpdir={{PWD}}/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' '-Djavax.net.ssl.trustStore=ExternalAndAWSTrustStore.jks' '-Djavax.net.ssl.trustStoreType=JKS' '-Djavax.net.ssl.trustStorePassword=amazon' '-DRDS_ROOT_CERT_PATH=rds-combined-ca-bundle.pem' '-DREDSHIFT_ROOT_CERT_PATH=redshift-ssl-ca-cert.pem' '-DRDS_TRUSTSTORE_URL=file:RDSTrustStore.jks' -Dspark.yarn.app.container.log.dir= org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.deploy.PythonRunner' --primary-py-file runscript.py --arg 'script_2019-02-27-18-58-26.py' --arg '--JOB_NAME' --arg 'SegmentEventsJsonToCsv' --arg '--JOB_ID' --arg 'j_53156865b026edbb62f5b002460ae275f81d253c15ec4bdce9344f6b28230dae' --arg '--JOB_RUN_ID' --arg 'jr_2a1f391ca6c93cf2e0c59260be40cb53c0f9d9a16754097503aaf61e97f1ab08' --arg '--job-bookmark-option' --arg 'job-bookmark-disable' --arg '--S3_CSV_OUTPUT_PATH' --arg 's3://personalize-data-[ACCOUNT_ID]/transformed' --arg '--TempDir' --arg 's3://aws-glue-temporary-537632985422-us-east-1/admin' --arg '--S3_JSON_INPUT_PATH' --arg 's3://personalize-data-[ACCOUNT_ID]/raw-events/' --properties-file {{PWD}}/spark_conf/spark_conf.properties 1> /stdout 2> /stderr
19/02/27 18:59:27 DEBUG Client: ===============================================================================
19/02/27 18:59:27 INFO SecurityManager: Changing view acls to: root
19/02/27 18:59:27 INFO SecurityManager: Changing modify acls to: root
19/02/27 18:59:27 INFO SecurityManager: Changing view acls groups to:
19/02/27 18:59:27 INFO SecurityManager: Changing modify acls groups to:
19/02/27 18:59:27 INFO SecurityManager: SecurityManager: authentication disable
d; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
19/02/27 18:59:27 INFO Client: Submitting application application_1551293620749_0001 to ResourceManager
19/02/27 18:59:27 INFO YarnClientImpl: Submitted application application_1551293620749_0001
19/02/27 18:59:28 INFO Client: Application report for application_1551293620749_0001 (state: ACCEPTED)
applicationid is application_1551293620749_0001, yarnRMDNS is ip-172-32-34-176.ec2.internal
Application info reporting is enabled.
----------Recording application Id and Yarn RM DNS for cancellation-----------------
0749_0001 (state: RUNNING)
19/02/27 18:59:36 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:37 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING)
19/02/27 18:59:37 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:38 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING)
19/02/27 18:59:38 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:39 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING)
19/02/27 18:59:39 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:40 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING)
19/02/27 18:59:40 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:41 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING)
19/02/27 18:59:41 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:42 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING)
19/02/27 18:59:42 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:43 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING)
19/02/27 18:59:43 DEBUG Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: UNDEFINED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
19/02/27 18:59:44 INFO Client: Application report for application_1551293620749_0001 (state: FINISHED)
19/02/27 18:59:44 DEBUG Client:
client token: N/A
diagnostics: User application exited with status 1
ApplicationMaster host: 172.32.40.232
ApplicationMaster RPC port: 0
queue: default
start time: 1551293967315
final status: FAILED
tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/
user: root
Exception in thread "main"
org.apache.spark.SparkException: Application application_1551293620749_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/02/27 18:59:44 INFO ShutdownHookManager: Shutdown hook called 19/02/27 18:59:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-f7a5598e-c04d-4ffd-ab39-33211f47c8cd
Container: container_1551293620749_0001_01_000001 on ip-172-32-40-232.ec2.internal_8041
LogType:stderr Log Upload Time:Wed Feb 27 18:59:45 +0000 2019 LogLength:18030 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/root/filecache/10/glue-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/root/filecache/16/__spark_libs__441647360030086276.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/02/27 18:59:31 INFO SignalUtils: Registered signal handler for TERM 19/02/27 18:59:31 INFO SignalUtils: Registered signal handler for HUP 19/02/27 18:59:31 INFO SignalUtils: Registered signal handler for INT 19/02/27 18:59:32 INFO ApplicationMaster: Preparing Local resources 19/02/27 18:59:33 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1551293620749_0001_000001 19/02/27 18:59:33 INFO SecurityManager: Changing view acls to: yarn,root 19/02/27 18:59:33 INFO SecurityManager: Changing modify acls to: yarn,root 19/02/27 18:59:33 INFO SecurityManager: Changing view acls groups to: 19/02/27 18:59:33 INFO SecurityManager: Changing modify acls groups to: 19/02/27 18:59:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set() 19/02/27 18:59:33 INFO ApplicationMaster: Starting the user application in a separate Thread 19/02/27 18:59:33 INFO ApplicationMaster: Waiting for spark context initialization... 19/02/27 18:59:34 INFO SparkContext: Running Spark version 2.2.1 19/02/27 18:59:34 INFO SparkContext: Submitted application: tape 19/02/27 18:59:34 INFO SecurityManager: Changing view acls to: yarn,root 19/02/27 18:59:34 INFO SecurityManager: Changing modify acls to: yarn,root 19/02/27 18:59:34 INFO SecurityManager: Changing view acls groups to: 19/02/27 18:59:34 INFO SecurityManager: Changing modify acls groups to: 19/02/27 18:59:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set() 19/02/27 18:59:34 INFO Utils: Successfully started service 'sparkDriver' on port 44303. 19/02/27 18:59:34 INFO SparkEnv: Registering MapOutputTracker 19/02/27 18:59:34 INFO SparkEnv: Registering BlockManagerMaster 19/02/27 18:59:34 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/02/27 18:59:34 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/02/27 18:59:34 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/root/appcache/application_1551293620749_0001/blockmgr-2b760287-9d28-4954-b067-7bea036a179a 19/02/27 18:59:34 INFO MemoryStore: MemoryStore started with capacity 2.8 GB 19/02/27 18:59:34 INFO SparkEnv: Registering OutputCommitCoordinator 19/02/27 18:59:35 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 19/02/27 18:59:35 INFO Utils: Successfully started service 'SparkUI' on port 35179. 19/02/27 18:59:35 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.32.40.232:35179 19/02/27 18:59:35 INFO YarnClusterScheduler: Created YarnClusterScheduler 19/02/27 18:59:35 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1551293620749_0001 and attemptId Some(appattempt_1551293620749_0001_000001) 19/02/27 18:59:35 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs. 19/02/27 18:59:35 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 19/02/27 18:59:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39447. 19/02/27 18:59:35 INFO NettyBlockTransferService: Server created on 172.32.40.232:39447 19/02/27 18:59:35 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/02/27 18:59:35 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 INFO BlockManagerMasterEndpoint: Registering block manager 172.32.40.232:39447 with 2.8 GB RAM, BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 INFO BlockManager: external shuffle service port = 7337 19/02/27 18:59:35 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs. 19/02/27 18:59:35 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 19/02/27 18:59:35 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 19/02/27 18:59:35 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@172.32.40.232:44303) 19/02/27 18:59:35 INFO ApplicationMaster:
YARN executor launch context: env: CLASSPATH -> ./:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/glue/etl/jars/aws-glue-datacatalog-spark-client-1.8.0-SNAPSHOT.jar{{PWD}}{{PWD}}/spark_conf{{PWD}}/ spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/ $HADOOP_COMMON_HOME/lib/$HADOOP_HDFS_HOME/ $HADOOP_HDFS_HOME/lib/$HADOOP_MAPRED_HOME/ $HADOOP_MAPRED_HOME/lib/$HADOOP_YARN_HOME/ $HADOOP_YARN_HOME/lib//usr/lib/hadoop-lzo/lib/ /usr/share/aws/emr/emrfs/conf/usr/share/aws/emr/emrfs/lib//usr/share/aws/emr/emrfs/auxlib/ /usr/share/aws/emr/lib//usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar/usr/share/aws/emr/cloudwatch-sink/lib/ /usr/share/aws/aws-java-sdk/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/ $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib//usr/lib/hadoop-lzo/lib/ /usr/share/aws/emr/emrfs/conf/usr/share/aws/emr/emrfs/lib//usr/share/aws/emr/emrfs/auxlib/ /usr/share/aws/emr/lib//usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar/usr/share/aws/emr/cloudwatch-sink/lib/ /usr/share/aws/aws-java-sdk/*
SPARK_YARN_STAGING_DIR -> (redacted)
SPARK_USER -> (redacted)
SPARK_YARN_MODE -> true
PYTHONPATH -> {{PWD}}/pyspark.zip{{PWD}}/py4j-0.10.4-src.zip{{PWD}}/PyGlue.zip
command: LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:$LD_LIBRARY_PATH" \ {{JAVA_HOME}}/bin/java \ -server \ -Xmx5120m \ '-XX:+UseConcMarkSweepGC' \ '-XX:CMSInitiatingOccupancyFraction=70' \ '-XX:MaxHeapFreeRatio=70' \ '-XX:+CMSClassUnloadingEnabled' \ '-XX:OnOutOfMemoryError=kill -9 %p' \ '-XX:+UseCompressedOops' \ '-Djavax.net.ssl.trustStore=ExternalAndAWSTrustStore.jks' \ '-Djavax.net.ssl.trustStoreType=JKS' \ '-Djavax.net.ssl.trustStorePassword=amazon' \ '-DRDS_ROOT_CERT_PATH=rds-combined-ca-bundle.pem' \ '-DREDSHIFT_ROOT_CERT_PATH=redshift-ssl-ca-cert.pem' \ '-DRDS_TRUSTSTORE_URL=file:RDSTrustStore.jks' \ -Djava.io.tmpdir={{PWD}}/tmp \ -Dspark.yarn.app.container.log.dir= \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@172.32.40.232:44303 \
--executor-id \