james-jory / segment-personalize-workshop

AWS workshop demonstrating how to integrate Segment with Amazon Personalize to build and deliver personalized customer experiences.
MIT No Attribution
37 stars 16 forks source link

Glue job failed (even after correctly copying parameter keys as shown in the tutorial text) #2

Closed dnemi closed 5 years ago

dnemi commented 5 years ago

-conf spark.hadoop.yarn.resourcemanager.connect.max-wait.ms=60000 --conf spark.hadoop.fs.defaultFS=hdfs://ip-172-32-34-176.ec2.internal:8020 --conf spark.hadoop.yarn.resourcemanager.address=ip-172-32-34-176.ec2.internal:8032 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.minExecutors=1 --conf spark.dynamicAllocation.maxExecutors=18 --conf spark.executor.memory=5g --conf spark.executor.cores=4 --conf spark.driver.memory=5g --JOB_ID j_53156865b026edbb62f5b002460ae275f81d253c15ec4bdce9344f6b28230dae --JOB_RUN_ID jr_2a1f391ca6c93cf2e0c59260be40cb53c0f9d9a16754097503aaf61e97f1ab08 --scriptLocation s3://aws-glue-scripts-537632985422-us-east-1/admin/SegmentEventsJsonToCsv --job-bookmark-option job-bookmark-disable --S3_CSV_OUTPUT_PATH s3://personalize-data-[ACCOUNT_ID]/transformed --job-language python --TempDir s3://aws-glue-temporary-537632985422-us-east-1/admin --S3_JSON_INPUT_PATH s3://personalize-data-[ACCOUNT_ID]/raw-events/ --JOB_NAME SegmentEventsJsonToCsv Detected region us-east-1 Detected glue endpoint https://glue.us-east-1.amazonaws.com YARN_RM_DNS=ip-172-32-34-176.ec2.internal JOB_NAME = SegmentEventsJsonToCsv Specifying us-east-1 while copying script. Completed 2.6 KiB/2.6 KiB (44.8 KiB/s) with 1 file(s) remaining download: s3://aws-glue-scripts-537632985422-us-east-1/admin/SegmentEventsJsonToCsv to ./script_2019-02-27-18-58-26.py SCRIPT_URL = /tmp/g-27272b67599361810bc7b488f8994dab5d0c7cdc-460412413296521275/script_2019-02-27-18-58-26.py /usr/lib/spark/bin/spark-submit --conf spark.hadoop.yarn.resourcemanager.connect.max-wait.ms=60000 --conf spark.hadoop.fs.defaultFS=hdfs://ip-172-32-34-176.ec2.internal:8020 --conf spark.hadoop.yarn.resourcemanager.address=ip-172-32-34-176.ec2.internal:8032 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.minExecutors=1 --conf spark.dynamicAllocation.maxExecutors=18 --conf spark.executor.memory=5g --conf spark.executor.cores=4 --conf spark.driver.memory=5g --name tape --master yarn --deploy-mode cluster --jars /opt/amazon/superjar/glue-assembly.jar --files /tmp/glue-default.conf,/tmp/glue-override.conf,/opt/amazon/certs/ExternalAndAWSTrustStore.jks,/opt/amazon/certs/rds-combined-ca-bundle.pem,/opt/amazon/certs/redshift-ssl-ca-cert.pem,/opt/amazon/certs/RDSTrustStore.jks,/tmp/image-creation-time,,/tmp/g-27272b67599361810bc7b488f8994dab5d0c7cdc-460412413296521275/script_2019-02-27-18-58-26.py --py-files /tmp/PyGlue.zip /tmp/runscript.py script_2019-02-27-18-58-26.py --JOB_NAME SegmentEventsJsonToCsv --JOB_ID j_53156865b026edbb62f5b002460ae275f81d253c15ec4bdce9344f6b28230dae --JOB_RUN_ID jr_2a1f391ca6c93cf2e0c59260be40cb53c0f9d9a16754097503aaf61e97f1ab08 --job-bookmark-option job-bookmark-disable --S3_CSV_OUTPUT_PATH s3://personalize-data-[ACCOUNT_ID]/transformed --TempDir s3://aws-glue-temporary-537632985422-us-east-1/admin --S3_JSON_INPUT_PATH s3://personalize-data-[ACCOUNT_ID]/raw-events/ 19/02/27 18:58:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/02/27 18:58:48 INFO RMProxy: Connecting to ResourceManager at ip-172-32-34-176.ec2.internal/172.32.34.176:8032 19/02/27 18:58:48 INFO Client: Requesting a new application from cluster with 9 NodeManagers 19/02/27 18:58:48 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container) 19/02/27 18:58:48 INFO Client: Will allocate AM container, with 5632 MB memory including 512 MB overhead 19/02/27 18:58:48 INFO Client: Setting up container launch context for our AM 19/02/27 18:58:48 INFO Client: Setting up the launch environment for our AM container 19/02/27 18:58:48 INFO Client: Preparing resources for our AM container 19/02/27 18:58:50 DEBUG Client: 19/02/27 18:58:50 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 19/02/27 18:58:59 INFO Client: Uploading resource file:/tmp/spark-f7a5598e-c04d-4ffd-ab39-33211f47c8cd/spark_libs441647360030086276.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/spark_libs441647360030086276.zip 19/02/27 18:59:02 INFO Client: Uploading resource file:/opt/amazon/superjar/glue-assembly.jar -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/glue-assembly.jar 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/glue-default.conf -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/glue-default.conf 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/glue-override.conf -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/glue-override.conf 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/ExternalAndAWSTrustStore.jks -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/ExternalAndAWSTrustStore.jks 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/rds-combined-ca-bundle.pem -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/rds-combined-ca-bundle.pem 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/redshift-ssl-ca-cert.pem -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/redshift-ssl-ca-cert.pem 19/02/27 18:59:26 INFO Client: Uploading resource file:/opt/amazon/certs/RDSTrustStore.jks -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/RDSTrustStore.jks 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/image-creation-time -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/image-creation-time 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/g-27272b67599361810bc7b488f8994dab5d0c7cdc-460412413296521275/script_2019-02-27-18-58-26.py -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/script_2019-02-27-18-58-26.py 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/runscript.py -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/runscript.py 19/02/27 18:59:26 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/pyspark.zip 19/02/27 18:59:26 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/py4j-0.10.4-src.zip 19/02/27 18:59:26 INFO Client: Uploading resource file:/tmp/PyGlue.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/PyGlue.zip 19/02/27 18:59:26 INFO Clien t: Uploading resource file:/tmp/spark-f7a5598e-c04d-4ffd-ab39-33211f47c8cd/spark_conf7256533821855478433.zip -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001/spark_conf.zip 19/02/27 18:59:26 DEBUG Client: =============================================================================== 19/02/27 18:59:26 DEBUG Client: YARN AM launch context: 19/02/27 18:59:26 DEBUG Client: user class: org.apache.spark.deploy.PythonRunner 19/02/27 18:59:26 DEBUG Client: env: 19/02/27 18:59:26 DEBUG Client: CLASSPATH -> ./:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/glue/etl/jars/aws-glue-datacatalog-spark-client-1.8.0-SNAPSHOT.jar{{PWD}}{{PWD}}/spark_conf{{PWD}}/spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/share/hadoop/common/$HADOOP_COMMON_HOME/share/hadoop/common/lib/$HADOOP_HDFS_HOME/share/hadoop/hdfs/$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/$HADOOP_YARN_HOME/share/hadoop/yarn/$HADOOP_YARN_HOME/share/hadoop/yarn/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/* 19/02/27 18:59:26 DEBUG Client: SPARK_YARN_STAGING_DIR -> hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001 19/02/27 18:59:26 DEBUG Client: SPARK_USER -> root 19/02/27 18:59:26 DEBUG Client: SPARK_YARN_MODE -> true 19/02/27 18:59:26 DEBUG Client: PYTHONHASHSEED -> 0 19/02/27 18:59:26 DEBUG Client: PYTHONPATH -> {{PWD}}/pyspark.zip{{PWD}}/py4j-0.10.4-src.zip{{PWD}}/PyGlue.zip 19/02/27 18:59:26 DEBUG Client: resources: 19/02/27 18:59:27 DEBUG Client: image-creation-time -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/image-creation-time" } size: 11 timestamp: 1551293966749 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: py4j-0.10.4-src.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/py4j-0.10.4-src.zip" } size: 74096 timestamp: 1551293966895 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: glue-assembly.jar -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-assembly.jar" } size: 423322980 timestamp: 1551293966507 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: pyspark.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/pyspark.zip" } size: 482687 timestamp: 1551293966848 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: spark_libs -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/spark_libs441647360030086276.zip" } size: 218234389 timestamp: 1551293942463 type: ARCHIVE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: redshift-ssl-ca-cert.pem -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/redshift-ssl-ca-cert.pem" } size: 8621 timestamp: 1551293966667 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: rds-combined-ca-bundle.pem -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/rds-combined-ca-bundle.pem" } size: 31848 timestamp: 1551293966642 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: glue-default.conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-default.conf" } size: 382 timestamp: 1551293966545 ty pe: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: runscript.py -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/runscript.py" } size: 3549 timestamp: 1551293966794 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: glue-override.conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-override.conf" } size: 264 timestamp: 1551293966568 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: ExternalAndAWSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/ExternalAndAWSTrustStore.jks" } size: 118406 timestamp: 1551293966618 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: PyGlue.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/PyGlue.zip" } size: 104304 timestamp: 1551293966944 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: spark_conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/spark_conf.zip" } size: 8098 timestamp: 1551293966984 type: ARCHIVE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: script_2019-02-27-18-58-26.py -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/script_2019-02-27-18-58-26.py" } size: 2646 timestamp: 1551293966772 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: RDSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/RDSTrustStore.jks" } size: 19135 timestamp: 1551293966702 type: FILE visibility: PRIVATE 19/02/27 18:59:27 DEBUG Client: command: 19/02/27 18:59:27 DEBUG Client: LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:$LD_LIBRARY_PATH" {{JAVA_HOME}}/bin/java -server -Xmx5120m -Djava.io.tmpdir={{PWD}}/tmp '-XX:+UseConcMarkSweepGC' '-XX:CMSInitiatingOccupancyFraction=70' '-XX:MaxHeapFreeRatio=70' '-XX:+CMSClassUnloadingEnabled' '-XX:OnOutOfMemoryError=kill -9 %p' '-Djavax.net.ssl.trustStore=ExternalAndAWSTrustStore.jks' '-Djavax.net.ssl.trustStoreType=JKS' '-Djavax.net.ssl.trustStorePassword=amazon' '-DRDS_ROOT_CERT_PATH=rds-combined-ca-bundle.pem' '-DREDSHIFT_ROOT_CERT_PATH=redshift-ssl-ca-cert.pem' '-DRDS_TRUSTSTORE_URL=file:RDSTrustStore.jks' -Dspark.yarn.app.container.log.dir= org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.deploy.PythonRunner' --primary-py-file runscript.py --arg 'script_2019-02-27-18-58-26.py' --arg '--JOB_NAME' --arg 'SegmentEventsJsonToCsv' --arg '--JOB_ID' --arg 'j_53156865b026edbb62f5b002460ae275f81d253c15ec4bdce9344f6b28230dae' --arg '--JOB_RUN_ID' --arg 'jr_2a1f391ca6c93cf2e0c59260be40cb53c0f9d9a16754097503aaf61e97f1ab08' --arg '--job-bookmark-option' --arg 'job-bookmark-disable' --arg '--S3_CSV_OUTPUT_PATH' --arg 's3://personalize-data-[ACCOUNT_ID]/transformed' --arg '--TempDir' --arg 's3://aws-glue-temporary-537632985422-us-east-1/admin' --arg '--S3_JSON_INPUT_PATH' --arg 's3://personalize-data-[ACCOUNT_ID]/raw-events/' --properties-file {{PWD}}/spark_conf/spark_conf.properties 1> /stdout 2> /stderr 19/02/27 18:59:27 DEBUG Client: =============================================================================== 19/02/27 18:59:27 INFO SecurityManager: Changing view acls to: root 19/02/27 18:59:27 INFO SecurityManager: Changing modify acls to: root 19/02/27 18:59:27 INFO SecurityManager: Changing view acls groups to: 19/02/27 18:59:27 INFO SecurityManager: Changing modify acls groups to: 19/02/27 18:59:27 INFO SecurityManager: SecurityManager: authentication disable d; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 19/02/27 18:59:27 INFO Client: Submitting application application_1551293620749_0001 to ResourceManager 19/02/27 18:59:27 INFO YarnClientImpl: Submitted application application_1551293620749_0001 19/02/27 18:59:28 INFO Client: Application report for application_1551293620749_0001 (state: ACCEPTED) applicationid is application_1551293620749_0001, yarnRMDNS is ip-172-32-34-176.ec2.internal Application info reporting is enabled. ----------Recording application Id and Yarn RM DNS for cancellation----------------- 0749_0001 (state: RUNNING) 19/02/27 18:59:36 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:37 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING) 19/02/27 18:59:37 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:38 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING) 19/02/27 18:59:38 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:39 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING) 19/02/27 18:59:39 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:40 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING) 19/02/27 18:59:40 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:41 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING) 19/02/27 18:59:41 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:42 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING) 19/02/27 18:59:42 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:43 INFO Client: Application report for application_1551293620749_0001 (state: RUNNING) 19/02/27 18:59:43 DEBUG Client: client token: N/A diagnostics: N/A ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: UNDEFINED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root 19/02/27 18:59:44 INFO Client: Application report for application_1551293620749_0001 (state: FINISHED) 19/02/27 18:59:44 DEBUG Client: client token: N/A diagnostics: User application exited with status 1 ApplicationMaster host: 172.32.40.232 ApplicationMaster RPC port: 0 queue: default start time: 1551293967315 final status: FAILED tracking URL: http://ip-172-32-34-176.ec2.internal:20888/proxy/application_1551293620749_0001/ user: root Exception in thread "main" org.apache.spark.SparkException: Application application_1551293620749_0001 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

19/02/27 18:59:44 INFO ShutdownHookManager: Shutdown hook called 19/02/27 18:59:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-f7a5598e-c04d-4ffd-ab39-33211f47c8cd

Container: container_1551293620749_0001_01_000001 on ip-172-32-40-232.ec2.internal_8041

LogType:stderr Log Upload Time:Wed Feb 27 18:59:45 +0000 2019 LogLength:18030 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/root/filecache/10/glue-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/root/filecache/16/__spark_libs__441647360030086276.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/02/27 18:59:31 INFO SignalUtils: Registered signal handler for TERM 19/02/27 18:59:31 INFO SignalUtils: Registered signal handler for HUP 19/02/27 18:59:31 INFO SignalUtils: Registered signal handler for INT 19/02/27 18:59:32 INFO ApplicationMaster: Preparing Local resources 19/02/27 18:59:33 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1551293620749_0001_000001 19/02/27 18:59:33 INFO SecurityManager: Changing view acls to: yarn,root 19/02/27 18:59:33 INFO SecurityManager: Changing modify acls to: yarn,root 19/02/27 18:59:33 INFO SecurityManager: Changing view acls groups to: 19/02/27 18:59:33 INFO SecurityManager: Changing modify acls groups to: 19/02/27 18:59:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set() 19/02/27 18:59:33 INFO ApplicationMaster: Starting the user application in a separate Thread 19/02/27 18:59:33 INFO ApplicationMaster: Waiting for spark context initialization... 19/02/27 18:59:34 INFO SparkContext: Running Spark version 2.2.1 19/02/27 18:59:34 INFO SparkContext: Submitted application: tape 19/02/27 18:59:34 INFO SecurityManager: Changing view acls to: yarn,root 19/02/27 18:59:34 INFO SecurityManager: Changing modify acls to: yarn,root 19/02/27 18:59:34 INFO SecurityManager: Changing view acls groups to: 19/02/27 18:59:34 INFO SecurityManager: Changing modify acls groups to: 19/02/27 18:59:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set() 19/02/27 18:59:34 INFO Utils: Successfully started service 'sparkDriver' on port 44303. 19/02/27 18:59:34 INFO SparkEnv: Registering MapOutputTracker 19/02/27 18:59:34 INFO SparkEnv: Registering BlockManagerMaster 19/02/27 18:59:34 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/02/27 18:59:34 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/02/27 18:59:34 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/root/appcache/application_1551293620749_0001/blockmgr-2b760287-9d28-4954-b067-7bea036a179a 19/02/27 18:59:34 INFO MemoryStore: MemoryStore started with capacity 2.8 GB 19/02/27 18:59:34 INFO SparkEnv: Registering OutputCommitCoordinator 19/02/27 18:59:35 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 19/02/27 18:59:35 INFO Utils: Successfully started service 'SparkUI' on port 35179. 19/02/27 18:59:35 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.32.40.232:35179 19/02/27 18:59:35 INFO YarnClusterScheduler: Created YarnClusterScheduler 19/02/27 18:59:35 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1551293620749_0001 and attemptId Some(appattempt_1551293620749_0001_000001) 19/02/27 18:59:35 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs. 19/02/27 18:59:35 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 19/02/27 18:59:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39447. 19/02/27 18:59:35 INFO NettyBlockTransferService: Server created on 172.32.40.232:39447 19/02/27 18:59:35 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/02/27 18:59:35 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 INFO BlockManagerMasterEndpoint: Registering block manager 172.32.40.232:39447 with 2.8 GB RAM, BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 INFO BlockManager: external shuffle service port = 7337 19/02/27 18:59:35 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.32.40.232, 39447, None) 19/02/27 18:59:35 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs. 19/02/27 18:59:35 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 19/02/27 18:59:35 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 19/02/27 18:59:35 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@172.32.40.232:44303) 19/02/27 18:59:35 INFO ApplicationMaster:

YARN executor launch context: env: CLASSPATH -> ./:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/:/usr/share/aws/glue/etl/jars/aws-glue-datacatalog-spark-client-1.8.0-SNAPSHOT.jar{{PWD}}{{PWD}}/spark_conf{{PWD}}/spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/$HADOOP_COMMON_HOME/lib/$HADOOP_HDFS_HOME/$HADOOP_HDFS_HOME/lib/$HADOOP_MAPRED_HOME/$HADOOP_MAPRED_HOME/lib/$HADOOP_YARN_HOME/$HADOOP_YARN_HOME/lib//usr/lib/hadoop-lzo/lib//usr/share/aws/emr/emrfs/conf/usr/share/aws/emr/emrfs/lib//usr/share/aws/emr/emrfs/auxlib//usr/share/aws/emr/lib//usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar/usr/share/aws/emr/cloudwatch-sink/lib//usr/share/aws/aws-java-sdk/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib//usr/lib/hadoop-lzo/lib//usr/share/aws/emr/emrfs/conf/usr/share/aws/emr/emrfs/lib//usr/share/aws/emr/emrfs/auxlib//usr/share/aws/emr/lib//usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar/usr/share/aws/emr/cloudwatch-sink/lib//usr/share/aws/aws-java-sdk/* SPARK_YARN_STAGING_DIR -> (redacted) SPARK_USER -> (redacted) SPARK_YARN_MODE -> true PYTHONPATH -> {{PWD}}/pyspark.zip{{PWD}}/py4j-0.10.4-src.zip{{PWD}}/PyGlue.zip

command: LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:$LD_LIBRARY_PATH" \ {{JAVA_HOME}}/bin/java \ -server \ -Xmx5120m \ '-XX:+UseConcMarkSweepGC' \ '-XX:CMSInitiatingOccupancyFraction=70' \ '-XX:MaxHeapFreeRatio=70' \ '-XX:+CMSClassUnloadingEnabled' \ '-XX:OnOutOfMemoryError=kill -9 %p' \ '-XX:+UseCompressedOops' \ '-Djavax.net.ssl.trustStore=ExternalAndAWSTrustStore.jks' \ '-Djavax.net.ssl.trustStoreType=JKS' \ '-Djavax.net.ssl.trustStorePassword=amazon' \ '-DRDS_ROOT_CERT_PATH=rds-combined-ca-bundle.pem' \ '-DREDSHIFT_ROOT_CERT_PATH=redshift-ssl-ca-cert.pem' \ '-DRDS_TRUSTSTORE_URL=file:RDSTrustStore.jks' \ -Djava.io.tmpdir={{PWD}}/tmp \ -Dspark.yarn.app.container.log.dir= \ org.apache.spark.executor.CoarseGrainedExecutorBackend \ --driver-url \ spark://CoarseGrainedScheduler@172.32.40.232:44303 \ --executor-id \

\ --hostname \ \ --cores \ 4 \ --app-id \ application_1551293620749_0001 \ --user-class-path \ file:$PWD/__app__.jar \ --user-class-path \ file:$PWD/glue-assembly.jar \ 1>/stdout \ 2>/stderr resources: rds-combined-ca-bundle.pem -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/rds-combined-ca-bundle.pem" } size: 31848 timestamp: 1551293966642 type: FILE visibility: PRIVATE py4j-0.10.4-src.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/py4j-0.10.4-src.zip" } size: 74096 timestamp: 1551293966895 type: FILE visibility: PRIVATE glue-assembly.jar -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-assembly.jar" } size: 423322980 timestamp: 1551293966507 type: FILE visibility: PRIVATE glue-override.conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-override.conf" } size: 264 timestamp: 1551293966568 type: FILE visibility: PRIVATE script_2019-02-27-18-58-26.py -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/script_2019-02-27-18-58-26.py" } size: 2646 timestamp: 1551293966772 type: FILE visibility: PRIVATE ExternalAndAWSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/ExternalAndAWSTrustStore.jks" } size: 118406 timestamp: 1551293966618 type: FILE visibility: PRIVATE __spark_conf__ -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/__spark_conf__.zip" } size: 8098 timestamp: 1551293966984 type: ARCHIVE visibility: PRIVATE RDSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/RDSTrustStore.jks" } size: 19135 timestamp: 1551293966702 type: FILE visibility: PRIVATE redshift-ssl-ca-cert.pem -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/redshift-ssl-ca-cert.pem" } size: 8621 timestamp: 1551293966667 type: FILE visibility: PRIVATE pyspark.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/pyspark.zip" } size: 482687 timestamp: 1551293966848 type: FILE visibility: PRIVATE PyGlue.zip -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/PyGlue.zip" } size: 104304 timestamp: 1551293966944 type: FILE visibility: PRIVATE __spark_libs__ -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/__spark_libs__441647360030086276.zip" } size: 218234389 timestamp: 1551293942463 type: ARCHIVE visibility: PRIVATE glue-default.conf -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/glue-default.conf" } size: 382 timesta mp: 1551293966545 type: FILE visibility: PRIVATE image-creation-time -> resource { scheme: "hdfs" host: "ip-172-32-34-176.ec2.internal" port: 8020 file: "/user/root/.sparkStaging/application_1551293620749_0001/image-creation-time" } size: 11 timestamp: 1551293966749 type: FILE visibility: PRIVATE =============================================================================== 19/02/27 18:59:35 INFO RMProxy: Connecting to ResourceManager at ip-172-32-34-176.ec2.internal/172.32.34.176:8030 19/02/27 18:59:35 INFO YarnRMClient: Registering the ApplicationMaster 19/02/27 18:59:35 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs. 19/02/27 18:59:35 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 19/02/27 18:59:35 INFO YarnAllocator: Will request 1 executor container(s), each with 4 core(s) and 5632 MB memory (including 512 MB of overhead) 19/02/27 18:59:35 INFO YarnAllocator: Submitted 1 unlocalized container requests. 19/02/27 18:59:35 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 19/02/27 18:59:35 INFO AMRMClientImpl: Received new token for : ip-172-32-51-201.ec2.internal:8041 19/02/27 18:59:35 INFO YarnAllocator: Launching container container_1551293620749_0001_01_000002 on host ip-172-32-51-201.ec2.internal for executor with ID 1 19/02/27 18:59:35 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/02/27 18:59:35 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 19/02/27 18:59:36 INFO ContainerManagementProtocolProxy: Opening proxy : ip-172-32-51-201.ec2.internal:8041 19/02/27 18:59:41 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.32.51.201:38776) with ID 1 19/02/27 18:59:41 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1) 19/02/27 18:59:41 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 19/02/27 18:59:41 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done 19/02/27 18:59:41 INFO BlockManagerMasterEndpoint: Registering block manager ip-172-32-51-201.ec2.internal:36867 with 2.8 GB RAM, BlockManagerId(1, ip-172-32-51-201.ec2.internal, 36867, None) 19/02/27 18:59:41 INFO GlueContext: GlueMetrics not configured 19/02/27 18:59:41 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 295.7 KB, free 2.8 GB) 19/02/27 18:59:41 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.6 KB, free 2.8 GB) 19/02/27 18:59:41 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.32.40.232:39447 (size: 25.6 KB, free: 2.8 GB) 19/02/27 18:59:42 INFO SparkContext: Created broadcast 0 from broadcast at DynamoConnection.scala:50 19/02/27 18:59:42 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 295.7 KB, free 2.8 GB) 19/02/27 18:59:42 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 25.6 KB, free 2.8 GB) 19/02/27 18:59:42 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.32.40.232:39447 (size: 25.6 KB, free: 2.8 GB) 19/02/27 18:59:42 INFO SparkContext: Created broadcast 1 from broadcast at DynamoConnection.scala:50 ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. 19/02/27 18:59:42 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.32.40.232:39447 in memory (size: 25.6 KB, free: 2.8 GB) 19/02/27 18:59:42 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 172.32.40.232:39447 in memory (size: 25.6 KB, free: 2.8 GB) 19/02/27 18:59:43 ERROR ApplicationMaster: User application exited with status 1 19/02/27 18:59:43 INFO ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: U ser application exited with status 1) 19/02/27 18:59:43 INFO SparkContext: Invoking stop() from shutdown hook 19/02/27 18:59:43 INFO SparkUI: Stopped Spark web UI at http://172.32.40.232:35179 19/02/27 18:59:43 INFO YarnAllocator: Driver requested a total number of 0 executor(s). 19/02/27 18:59:43 INFO YarnClusterSchedulerBackend: Shutting down all executors 19/02/27 18:59:43 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 19/02/27 18:59:43 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 19/02/27 18:59:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 19/02/27 18:59:43 INFO MemoryStore: MemoryStore cleared 19/02/27 18:59:43 INFO BlockManager: BlockManager stopped 19/02/27 18:59:43 INFO BlockManagerMaster: BlockManagerMaster stopped 19/02/27 18:59:43 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 19/02/27 18:59:43 INFO SparkContext: Successfully stopped SparkContext 19/02/27 18:59:43 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User application exited with status 1) 19/02/27 18:59:43 INFO AMRMClientImpl: Waiting for application to be successfully unregistered. 19/02/27 18:59:43 INFO ApplicationMaster: Deleting staging directory hdfs://ip-172-32-34-176.ec2.internal:8020/user/root/.sparkStaging/application_1551293620749_0001 19/02/27 18:59:43 INFO ShutdownHookManager: Shutdown hook called 19/02/27 18:59:43 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/root/appcache/application_1551293620749_0001/spark-91ab2cc2-20a2-444d-97bf-50b358df470b/pyspark-3a135983-ba36-4940-8ff8-f0b34281a8b6 19/02/27 18:59:43 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/root/appcache/application_1551293620749_0001/spark-91ab2cc2-20a2-444d-97bf-50b358df470b End of LogType:stderr LogType:stdout Log Upload Time:Wed Feb 27 18:59:45 +0000 2019 LogLength:1419 Log Contents: Traceback (most recent call last): File "script_2019-02-27-18-58-26.py", line 20, in datasource0 = glueContext.create_dynamic_frame.from_options('s3', {'paths': [args['S3_JSON_INPUT_PATH']]}, 'json') File "/mnt/yarn/usercache/root/appcache/application_1551293620749_0001/container_1551293620749_0001_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 552, in from_options File "/mnt/yarn/usercache/root/appcache/application_1551293620749_0001/container_1551293620749_0001_01_000001/PyGlue.zip/awsglue/context.py", line 153, in create_dynamic_frame_from_options File "/mnt/yarn/usercache/root/appcache/application_1551293620749_0001/container_1551293620749_0001_01_000001/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame File "/mnt/yarn/usercache/root/appcache/application_1551293620749_0001/container_1551293620749_0001_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/mnt/yarn/usercache/root/appcache/application_1551293620749_0001/container_1551293620749_0001_01_000001/pyspark.zip/pyspark/sql/utils.py", line 79, in deco pyspark.sql.utils.IllegalArgumentException: u'java.net.URISyntaxException: Illegal character in hostname at index 21: s3://personalize-data-[ACCOUNT_ID]/raw-events' End of LogType:stdout Container: container_1551293620749_0001_01_000002 on ip-172-32-51-201.ec2.internal_8041 ========================================================================================= LogType:stderr Log Upload Time:Wed Feb 27 18:59:44 +0000 2019 LogLength:4271 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/root/filecache/10/glue-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/root/filecache/15/__spark_libs__441647360030086276.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/ codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/02/27 18:59:40 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 10985@ip-172-32-51-201 19/02/27 18:59:40 INFO SignalUtils: Registered signal handler for TERM 19/02/27 18:59:40 INFO SignalUtils: Registered signal handler for HUP 19/02/27 18:59:40 INFO SignalUtils: Registered signal handler for INT 19/02/27 18:59:40 INFO SecurityManager: Changing view acls to: yarn,root 19/02/27 18:59:40 INFO SecurityManager: Changing modify acls to: yarn,root 19/02/27 18:59:40 INFO SecurityManager: Changing view acls groups to: 19/02/27 18:59:40 INFO SecurityManager: Changing modify acls groups to: 19/02/27 18:59:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set() 19/02/27 18:59:41 INFO TransportClientFactory: Successfully created connection to /172.32.40.232:44303 after 42 ms (0 ms spent in bootstraps) 19/02/27 18:59:41 INFO SecurityManager: Changing view acls to: yarn,root 19/02/27 18:59:41 INFO SecurityManager: Changing modify acls to: yarn,root 19/02/27 18:59:41 INFO SecurityManager: Changing view acls groups to: 19/02/27 18:59:41 INFO SecurityManager: Changing modify acls groups to: 19/02/27 18:59:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); groups with view permissions: Set(); users with modify permissions: Set(yarn, root); groups with modify permissions: Set() 19/02/27 18:59:41 INFO TransportClientFactory: Successfully created connection to /172.32.40.232:44303 after 4 ms (0 ms spent in bootstraps) 19/02/27 18:59:41 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/root/appcache/application_1551293620749_0001/blockmgr-c9e7a162-e76e-4054-baef-a2564f45f207 19/02/27 18:59:41 INFO MemoryStore: MemoryStore started with capacity 2.8 GB 19/02/27 18:59:41 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@172.32.40.232:44303 19/02/27 18:59:41 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 19/02/27 18:59:41 INFO Executor: Starting executor ID 1 on host ip-172-32-51-201.ec2.internal 19/02/27 18:59:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36867. 19/02/27 18:59:41 INFO NettyBlockTransferService: Server created on ip-172-32-51-201.ec2.internal:36867 19/02/27 18:59:41 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/02/27 18:59:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, ip-172-32-51-201.ec2.internal, 36867, None) 19/02/27 18:59:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, ip-172-32-51-201.ec2.internal, 36867, None) 19/02/27 18:59:41 INFO BlockManager: external shuffle service port = 7337 19/02/27 18:59:41 INFO BlockManager: Registering executor with local external shuffle service. 19/02/27 18:59:41 INFO TransportClientFactory: Successfully created connection to ip-172-32-51-201.ec2.internal/172.32.51.201:7337 after 1 ms (0 ms spent in bootstraps) 19/02/27 18:59:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, ip-172-32-51-201.ec2.internal, 36867, None) 19/02/27 18:59:43 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown 19/02/27 18:59:43 INFO MemoryStore: MemoryStore cleared 19/02/27 18:59:43 INFO BlockManager: BlockManager stopped 19/02/27 18:59:43 INFO ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Wed Feb 27 18:59:44 +0000 2019 LogLength:0 Log Contents: End of LogType:stdout
james-jory commented 5 years ago

Here is the relevant error from the log output above:

java.net.URISyntaxException: Illegal character in hostname at index 21: s3://personalize-data-[ACCOUNT_ID]/raw-events

The cause of the issue in this case is that the S3_JSON_INPUT_PATH and S3_CSV_OUTPUT_PATH job parameters are not correct. You need to substitute your actual AWS account ID (without hyphens) where "[ACCOUNT_ID]" is specified in the sample paths from the documentation.

--S3_JSON_INPUT_PATH s3://personalize-data-[ACCOUNT_ID]/raw-events/
--S3_CSV_OUTPUT_PATH s3://personalize-data-[ACCOUNT_ID]/transformed

This just so happens to be the bucket naming standard used for the workshop. If testing the exercise outside of a managed workshop, substitute the name of the bucket as necessary.