yahoo / TensorFlowOnSpark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Apache License 2.0
3.88k stars 939 forks source link

MNIST example hits an error about 'AutoProxy[get_queue]' object has no attribute 'put' #248

Closed EdwardZhang88 closed 6 years ago

EdwardZhang88 commented 6 years ago

I am trying to spark-submit the MNIST example on a 3 node Hadoop v3.0+Spark v2.3 cluster and not sure what to do with the error below I encountered. spark-submit \ --master yarn \ --deploy-mode cluster \ --queue default \ --num-executors 2 \ --executor-memory 24G \ --py-files /home/junzhang22/TensorFlowOnSpark/examples/mnist/spark/mnist_dist.py \ --conf spark.dynamicAllocation.enabled=false \ --conf spark.yarn.maxAppAttempts=1 \ /home/junzhang22/TensorFlowOnSpark/examples/mnist/spark/mnist_spark.py \ --images hdfs://dlaas-185:9000/home/junzhang22/mnist/csv/train/images \ --labels hdfs://dlaas-185:9000/home/junzhang22/mnist/csv/train/labels \ --mode train \ --model mnist_model

Driver stacktrace: 2018-03-19 17:24:25 INFO DAGScheduler:54 - Job 1 failed: collect at PythonRDD.scala:153, took 13.917543 s Traceback (most recent call last): File "mnist_spark.py", line 69, in <module> cluster.train(dataRDD, args.epochs) File "/usr/lib/python2.7/site-packages/tensorflowonspark/TFCluster.py", line 90, in train unionRDD.foreachPartition(TFSparkNode.train(self.cluster_info, self.cluster_meta, qname)) File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 814, in foreachPartition File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 1056, in count File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 1047, in sum File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 921, in fold File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 824, in collect File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__ File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 (TID 20, dlaas-183, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000003/pyspark.zip/pyspark/worker.py", line 229, in main process() File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000003/pyspark.zip/pyspark/worker.py", line 224, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 362, in func File "/usr/local/hadoop-3.0.0/tmp/nm-local-dir/usercache/root/appcache/application_1520844482373_0026/container_1520844482373_0026_01_000001/pyspark.zip/pyspark/rdd.py", line 809, in func File "/usr/lib/python2.7/site-packages/tensorflowonspark/TFSparkNode.py", line 367, in _train queue.put(item, block=True) AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put'

From the log, I can see that cluster has been set up and MNIST data had been partitioned into multiple tasks as well. The only thing I suspect is that I installed tensorflowonspark via pip and its version is tensorflowonspark (1.2.1) - Deep learning with TensorFlow on Apache Spark clusters INSTALLED: 1.2.1 (latest)

Anyone who happens to run into the same issue, please comment and suggest. Thanks.

leewyang commented 6 years ago

For Hadoop/YARN, you will need to make sure that tensorflowonspark is installed on all of your grid nodes. Otherwise, you'll have to use the --py-files tfspark.zip method.

EdwardZhang88 commented 6 years ago

@leewyang Thanks for the reply. Initially, I installed tensorflowonspark on all my worker nodes via pip. After I pip uninstalled and explicitly specified tfspark.zip in my spark-submit statement; the same error persisted. Since the error AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put' seems to be some Python version issue, I then upgraded python from 2.7.5 to 2.7.12. I have tried the upgrade by both passing the complied Pyhton zip in spark-submit and directly linking nodes' python bin to 2.7.12; howver, neither really helped. Not sure if I need to downgrade Hadoop(v3.0) and Spark(v2.3) version as well. BTW, is there any plan to dockerize the task (to get rid of all these version issues) since K8S is integrated into Spark as well.

leewyang commented 6 years ago

There aren't any short-term plans to dockerize the task, although that would be an eventual goal...

Can you grab the full yarn logs and check for any errors on the executors?

markfengyunzhou commented 6 years ago

the same issue i met, with hadoop 2.9 , spark2.3 and python3.6.

Is the problem solved ?

leewyang commented 6 years ago

@markfengyunzhou can you send me your spark-submit command line and the yarn logs for a failure? (There's a chance that this is just a symptom of an upstream failure).

markfengyunzhou commented 6 years ago

@leewyang Thanks for the reply.

=======below is my command line ========

PYSPARK_DRIVER_PYTHON=Python.zip/bin/python3 \ PYSPARK_PYTHON=Python.zip/bin/python3 \ spark-submit \ --master yarn \ --deploy-mode cluster \ --num-executors 3 \ --executor-memory 20G \ --executor-cores 8 \ --py-files /home/hpe/TensorFlowOnSpark/tfspark.zip,/home/hpe/TensorFlowOnSpark/examples/mnist/spark/mnist_dist.py \ --conf spark.dynamicAllocation.enabled=false \ --conf spark.yarn.maxAppAttempts=1 \ --jars hdfs:///lib/tensorflow-hadoop-1.6.0.jar \ --archives hdfs:///lib/Python.zip#Python \ /home/hpe/TensorFlowOnSpark/examples/mnist/spark/mnist_spark.py \ --images hdfs:///data/mnist/csv/train/images \ --labels hdfs:///data/mnist/csv/train/labels \ --format csv \ --mode train \ --model mnist_model

=========== below is yarn log ======== ResourceManagerRM HomeNodeManagerTools 2018-05-04 11:09:47 INFO SignalUtils:54 - Registered signal handler for TERM 2018-05-04 11:09:47 INFO SignalUtils:54 - Registered signal handler for HUP 2018-05-04 11:09:47 INFO SignalUtils:54 - Registered signal handler for INT 2018-05-04 11:09:47 INFO SecurityManager:54 - Changing view acls to: hpe 2018-05-04 11:09:47 INFO SecurityManager:54 - Changing modify acls to: hpe 2018-05-04 11:09:47 INFO SecurityManager:54 - Changing view acls groups to: 2018-05-04 11:09:47 INFO SecurityManager:54 - Changing modify acls groups to: 2018-05-04 11:09:47 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hpe); groups with view permissions: Set(); users with modify permissions: Set(hpe); groups with modify permissions: Set() 2018-05-04 11:09:48 INFO ApplicationMaster:54 - Preparing Local resources 2018-05-04 11:09:49 INFO ApplicationMaster:54 - ApplicationAttemptId: appattempt_1525333084924_0008_000001 2018-05-04 11:09:49 INFO ApplicationMaster:54 - Starting the user application in a separate Thread 2018-05-04 11:09:49 INFO ApplicationMaster:54 - Waiting for spark context initialization... 2018-05-04 11:09:51 INFO SparkContext:54 - Running Spark version 2.3.0 2018-05-04 11:09:51 INFO SparkContext:54 - Submitted application: mnist_spark 2018-05-04 11:09:51 INFO SecurityManager:54 - Changing view acls to: hpe 2018-05-04 11:09:51 INFO SecurityManager:54 - Changing modify acls to: hpe 2018-05-04 11:09:51 INFO SecurityManager:54 - Changing view acls groups to: 2018-05-04 11:09:51 INFO SecurityManager:54 - Changing modify acls groups to: 2018-05-04 11:09:51 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hpe); groups with view permissions: Set(); users with modify permissions: Set(hpe); groups with modify permissions: Set() 2018-05-04 11:09:52 INFO Utils:54 - Successfully started service 'sparkDriver' on port 43550. 2018-05-04 11:09:52 INFO SparkEnv:54 - Registering MapOutputTracker 2018-05-04 11:09:52 INFO SparkEnv:54 - Registering BlockManagerMaster 2018-05-04 11:09:52 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2018-05-04 11:09:52 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2018-05-04 11:09:52 INFO DiskBlockManager:54 - Created local directory at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/blockmgr-13a54613-8d5d-4b9a-a39c-6cd1df8cb82e 2018-05-04 11:09:52 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB 2018-05-04 11:09:52 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2018-05-04 11:09:52 INFO log:192 - Logging initialized @5906ms 2018-05-04 11:09:52 INFO JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 2018-05-04 11:09:52 INFO Server:346 - jetty-9.3.z-SNAPSHOT 2018-05-04 11:09:52 INFO Server:414 - Started @6029ms 2018-05-04 11:09:52 INFO AbstractConnector:278 - Started ServerConnector@78fba911{HTTP/1.1,[http/1.1]}{0.0.0.0:40699} 2018-05-04 11:09:52 INFO Utils:54 - Successfully started service 'SparkUI' on port 40699. 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60755b5{/jobs,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4879d728{/jobs/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1933c153{/jobs/job,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4ff5ff54{/jobs/job/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4933b115{/stages,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4e0dc839{/stages/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3922ff21{/stages/stage,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2d4e5ca6{/stages/stage/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@33b75d9{/stages/pool,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@37022078{/stages/pool/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2a34d4ea{/storage,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@75c00666{/storage/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2d0d30ba{/storage/rdd,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@b2ac483{/storage/rdd/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b5c288{/environment,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@72b3172e{/environment/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36ff7122{/executors,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6b2cb5a2{/executors/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@42a49bcd{/executors/threadDump,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2844fed0{/executors/threadDump/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@657ffc9b{/static,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@78e80855{/,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@62471901{/api,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@517bd229{/jobs/job/kill,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7490183b{/stages/stage/kill,null,AVAILABLE,@Spark} 2018-05-04 11:09:52 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://hpe01:40699 2018-05-04 11:09:53 INFO YarnClusterScheduler:54 - Created YarnClusterScheduler 2018-05-04 11:09:53 INFO SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1525333084924_0008 and attemptId Some(appattempt_1525333084924_0008_000001) 2018-05-04 11:09:53 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35479. 2018-05-04 11:09:53 INFO NettyBlockTransferService:54 - Server created on hpe01:35479 2018-05-04 11:09:53 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2018-05-04 11:09:53 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, hpe01, 35479, None) 2018-05-04 11:09:53 INFO BlockManagerMasterEndpoint:54 - Registering block manager hpe01:35479 with 366.3 MB RAM, BlockManagerId(driver, hpe01, 35479, None) 2018-05-04 11:09:53 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, hpe01, 35479, None) 2018-05-04 11:09:53 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, hpe01, 35479, None) 2018-05-04 11:09:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@26dfd529{/metrics/json,null,AVAILABLE,@Spark} 2018-05-04 11:09:53 INFO ApplicationMaster:54 - =============================================================================== YARN executor launch context: env: CLASSPATH -> {{PWD}}{{PWD}}/spark_conf{{PWD}}/spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/share/hadoop/common/$HADOOP_COMMON_HOME/share/hadoop/common/lib/$HADOOP_HDFS_HOME/share/hadoop/hdfs/$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/$HADOOP_YARN_HOME/share/hadoop/yarn/$HADOOP_YARN_HOME/share/hadoop/yarn/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*{{PWD}}/spark_conf/hadoop_conf SPARK_YARN_STAGING_DIR -> (redacted) SPARK_USER -> (redacted) PYTHONPATH -> {{PWD}}/pyfiles{{PWD}}/pyspark.zip{{PWD}}/py4j-0.10.6-src.zip{{PWD}}/tfspark.zip command: {{JAVA_HOME}}/bin/java \ -server \ -Xmx20480m \ -Djava.io.tmpdir={{PWD}}/tmp \ -Dspark.yarn.app.container.log.dir= \ -XX:OnOutOfMemoryError='kill %p' \ org.apache.spark.executor.CoarseGrainedExecutorBackend \ --driver-url \ spark://CoarseGrainedScheduler@hpe01:43550 \ --executor-id \ \ --hostname \ \ --cores \ 8 \ --app-id \ application_1525333084924_0008 \ --user-class-path \ file:$PWD/app.jar \ --user-class-path \ file:$PWD/tensorflow-hadoop-1.6.0.jar \ 1>/stdout \ 2>/stderr resources: tensorflow-hadoop-1.6.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/lib/tensorflow-hadoop-1.6.0.jar" } size: 123665 timestamp: 1525403332711 type: FILE visibility: PUBLIC py4j-0.10.6-src.zip -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0008/py4j-0.10.6-src.zip" } size: 80352 timestamp: 1525403383948 type: FILE visibility: PRIVATE spark_conf -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0008/spark_conf.zip" } size: 194867 timestamp: 1525403384197 type: ARCHIVE visibility: PRIVATE tfspark.zip -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0008/tfspark.zip" } size: 33148 timestamp: 1525403383981 type: FILE visibility: PRIVATE pyspark.zip -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0008/pyspark.zip" } size: 538841 timestamp: 1525403383915 type: FILE visibility: PRIVATE pyfiles/mnist_dist.py -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0008/mnist_dist.py" } size: 5883 timestamp: 1525403384017 type: FILE visibility: PRIVATE spark_libs -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0008/spark_libs9168122747315941231.zip" } size: 234584665 timestamp: 1525403383676 type: ARCHIVE visibility: PRIVATE Python.zip -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/lib/Python.zip" } size: 173421456 timestamp: 1525238486507 type: ARCHIVE visibility: PUBLIC =============================================================================== 2018-05-04 11:09:53 INFO RMProxy:98 - Connecting to ResourceManager at hpe01/192.168.136.158:8030 2018-05-04 11:09:53 INFO YarnRMClient:54 - Registering the ApplicationMaster 2018-05-04 11:09:53 INFO YarnAllocator:54 - Will request 3 executor container(s), each with 8 core(s) and 22528 MB memory (including 2048 MB of overhead) 2018-05-04 11:09:53 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@hpe01:43550) 2018-05-04 11:09:53 INFO YarnAllocator:54 - Submitted 3 unlocalized container requests. 2018-05-04 11:09:53 INFO ApplicationMaster:54 - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 2018-05-04 11:09:57 INFO AMRMClientImpl:360 - Received new token for : hpe03:35000 2018-05-04 11:09:57 INFO AMRMClientImpl:360 - Received new token for : hpe01:33415 2018-05-04 11:09:57 INFO YarnAllocator:54 - Launching container container_1525333084924_0008_01_000002 on host hpe01 for executor with ID 1 2018-05-04 11:09:57 INFO YarnAllocator:54 - Launching container container_1525333084924_0008_01_000003 on host hpe03 for executor with ID 2 2018-05-04 11:09:57 INFO YarnAllocator:54 - Received 2 containers from YARN, launching executors on 2 of them. 2018-05-04 11:09:57 INFO ContainerManagementProtocolProxy:81 - yarn.client.max-cached-nodemanagers-proxies : 0 2018-05-04 11:09:57 INFO ContainerManagementProtocolProxy:81 - yarn.client.max-cached-nodemanagers-proxies : 0 2018-05-04 11:09:57 INFO ContainerManagementProtocolProxy:260 - Opening proxy : hpe03:35000 2018-05-04 11:09:57 INFO ContainerManagementProtocolProxy:260 - Opening proxy : hpe01:33415 2018-05-04 11:09:59 INFO AMRMClientImpl:360 - Received new token for : hpe02:36742 2018-05-04 11:09:59 INFO YarnAllocator:54 - Launching container container_1525333084924_0008_01_000005 on host hpe02 for executor with ID 3 2018-05-04 11:09:59 INFO YarnAllocator:54 - Received 1 containers from YARN, launching executors on 1 of them. 2018-05-04 11:09:59 INFO ContainerManagementProtocolProxy:81 - yarn.client.max-cached-nodemanagers-proxies : 0 2018-05-04 11:09:59 INFO ContainerManagementProtocolProxy:260 - Opening proxy : hpe02:36742 2018-05-04 11:10:01 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.136.158:48706) with ID 1 2018-05-04 11:10:01 INFO BlockManagerMasterEndpoint:54 - Registering block manager hpe01:40295 with 10.5 GB RAM, BlockManagerId(1, hpe01, 40295, None) 2018-05-04 11:10:04 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.136.160:45790) with ID 2 2018-05-04 11:10:04 INFO BlockManagerMasterEndpoint:54 - Registering block manager hpe03:45793 with 10.5 GB RAM, BlockManagerId(2, hpe03, 45793, None) 2018-05-04 11:10:06 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.136.159:48648) with ID 3 2018-05-04 11:10:06 INFO YarnClusterSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 2018-05-04 11:10:06 INFO YarnClusterScheduler:54 - YarnClusterScheduler.postStartHook done 2018-05-04 11:10:06 INFO BlockManagerMasterEndpoint:54 - Registering block manager hpe02:35731 with 10.5 GB RAM, BlockManagerId(3, hpe02, 35731, None) args: Namespace(batch_size=100, cluster_size=3, epochs=1, format='csv', images='hdfs:///data/mnist/csv/train/images', labels='hdfs:///data/mnist/csv/train/labels', mode='train', model='mnist_model', output='predictions', rdma=False, readers=1, steps=1000, tensorboard=False) 2018-05-04T11:10:06.247055 ===== Start 2018-05-04 11:10:06 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 241.0 KB, free 366.1 MB) 2018-05-04 11:10:06 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 366.0 MB) 2018-05-04 11:10:06 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe01:35479 (size: 23.2 KB, free: 366.3 MB) 2018-05-04 11:10:06 INFO SparkContext:54 - Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0 2018-05-04 11:10:06 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 241.1 KB, free 365.8 MB) 2018-05-04 11:10:06 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.2 KB, free 365.8 MB) 2018-05-04 11:10:06 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe01:35479 (size: 23.2 KB, free: 366.3 MB) 2018-05-04 11:10:06 INFO SparkContext:54 - Created broadcast 1 from textFile at NativeMethodAccessorImpl.java:0 zipping images and labels 2018-05-04 11:10:06 INFO FileInputFormat:249 - Total input paths to process : 10 2018-05-04 11:10:06 INFO FileInputFormat:249 - Total input paths to process : 10 2018-05-04 11:10:06,979 INFO (MainThread-22771) Reserving TFSparkNodes 2018-05-04 11:10:06,980 INFO (MainThread-22771) cluster_template: {'ps': range(0, 1), 'worker': range(1, 3)} 2018-05-04 11:10:06,984 INFO (MainThread-22771) listening for reservations at ('192.168.136.158', 34824) 2018-05-04 11:10:06,985 INFO (MainThread-22771) Starting TensorFlow on executors 2018-05-04 11:10:07,001 INFO (MainThread-22771) Waiting for TFSparkNodes to start 2018-05-04 11:10:07,001 INFO (MainThread-22771) waiting for 3 reservations 2018-05-04 11:10:07 INFO SparkContext:54 - Starting job: foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293 2018-05-04 11:10:07 INFO DAGScheduler:54 - Got job 0 (foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293) with 3 output partitions 2018-05-04 11:10:07 INFO DAGScheduler:54 - Final stage: ResultStage 0 (foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293) 2018-05-04 11:10:07 INFO DAGScheduler:54 - Parents of final stage: List() 2018-05-04 11:10:07 INFO DAGScheduler:54 - Missing parents: List() 2018-05-04 11:10:07 INFO DAGScheduler:54 - Submitting ResultStage 0 (PythonRDD[8] at foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293), which has no missing parents 2018-05-04 11:10:07 INFO MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 14.3 KB, free 365.8 MB) 2018-05-04 11:10:07 INFO MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 10.1 KB, free 365.8 MB) 2018-05-04 11:10:07 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe01:35479 (size: 10.1 KB, free: 366.2 MB) 2018-05-04 11:10:07 INFO SparkContext:54 - Created broadcast 2 from broadcast at DAGScheduler.scala:1039 2018-05-04 11:10:07 INFO DAGScheduler:54 - Submitting 3 missing tasks from ResultStage 0 (PythonRDD[8] at foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293) (first 15 tasks are for partitions Vector(0, 1, 2)) 2018-05-04 11:10:07 INFO YarnClusterScheduler:54 - Adding task set 0.0 with 3 tasks 2018-05-04 11:10:07 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, hpe01, executor 1, partition 0, PROCESS_LOCAL, 7828 bytes) 2018-05-04 11:10:07 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, hpe03, executor 2, partition 1, PROCESS_LOCAL, 7828 bytes) 2018-05-04 11:10:07 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, hpe02, executor 3, partition 2, PROCESS_LOCAL, 7828 bytes) 2018-05-04 11:10:07 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe02:35731 (size: 10.1 KB, free: 10.5 GB) 2018-05-04 11:10:07 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe01:40295 (size: 10.1 KB, free: 10.5 GB) 2018-05-04 11:10:07 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe03:45793 (size: 10.1 KB, free: 10.5 GB) 2018-05-04 11:10:08,003 INFO (MainThread-22771) waiting for 3 reservations 2018-05-04 11:10:09,004 INFO (MainThread-22771) waiting for 3 reservations 2018-05-04 11:10:10,006 INFO (MainThread-22771) waiting for 3 reservations 2018-05-04 11:10:11,258 INFO (MainThread-22771) waiting for 3 reservations 2018-05-04 11:10:12,338 INFO (MainThread-22771) waiting for 2 reservations 2018-05-04 11:10:13,425 INFO (MainThread-22771) all reservations completed 2018-05-04 11:10:13,426 INFO (MainThread-22771) All TFSparkNodes started 2018-05-04 11:10:13,426 INFO (MainThread-22771) {'executor_id': 2, 'host': '192.168.136.159', 'job_name': 'worker', 'task_index': 1, 'port': 42338, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-yxoaueps/listener-hmrscm5n', 'authkey': b'\xe3\xcd\x05Y\x9av@\x0f\x89\xcc\xe7\xe4\x87\xe5\xa5S'} 2018-05-04 11:10:13,427 INFO (MainThread-22771) {'executor_id': 0, 'host': '192.168.136.158', 'job_name': 'ps', 'task_index': 0, 'port': 39690, 'tb_pid': 0, 'tb_port': 0, 'addr': ('192.168.136.158', 34282), 'authkey': b'\.(\xd6\ \xe6By\x8f\xc2u\x81\xf0}\xc2\xc4'} 2018-05-04 11:10:13,427 INFO (MainThread-22771) {'executor_id': 1, 'host': '192.168.136.160', 'job_name': 'worker', 'task_index': 0, 'port': 37236, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-ldxnzl9c/listener-z7su08n6', 'authkey': b'\xe0\x98\xb3wv\xb4A\xb4\x9a\x1b\x88T\x88\x83\xeeu'} 2018-05-04 11:10:13,428 INFO (MainThread-22771) Feeding training data 2018-05-04 11:10:13 INFO SparkContext:54 - Starting job: collect at PythonRDD.scala:153 2018-05-04 11:10:13 INFO DAGScheduler:54 - Got job 1 (collect at PythonRDD.scala:153) with 10 output partitions 2018-05-04 11:10:13 INFO DAGScheduler:54 - Final stage: ResultStage 1 (collect at PythonRDD.scala:153) 2018-05-04 11:10:13 INFO DAGScheduler:54 - Parents of final stage: List() 2018-05-04 11:10:13 INFO DAGScheduler:54 - Missing parents: List() 2018-05-04 11:10:13 INFO DAGScheduler:54 - Submitting ResultStage 1 (PythonRDD[10] at RDD at PythonRDD.scala:48), which has no missing parents 2018-05-04 11:10:13 INFO MemoryStore:54 - Block broadcast_3 stored as values in memory (estimated size 15.0 KB, free 365.7 MB) 2018-05-04 11:10:13 INFO MemoryStore:54 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.1 KB, free 365.7 MB) 2018-05-04 11:10:13 INFO BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe01:35479 (size: 8.1 KB, free: 366.2 MB) 2018-05-04 11:10:13 INFO SparkContext:54 - Created broadcast 3 from broadcast at DAGScheduler.scala:1039 2018-05-04 11:10:13 INFO DAGScheduler:54 - Submitting 10 missing tasks from ResultStage 1 (PythonRDD[10] at RDD at PythonRDD.scala:48) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)) 2018-05-04 11:10:13 INFO YarnClusterScheduler:54 - Adding task set 1.0 with 10 tasks 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 0.0 in stage 1.0 (TID 3, hpe02, executor 3, partition 0, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 1.0 in stage 1.0 (TID 4, hpe03, executor 2, partition 1, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 3.0 in stage 1.0 (TID 5, hpe01, executor 1, partition 3, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 2.0 in stage 1.0 (TID 6, hpe02, executor 3, partition 2, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 4.0 in stage 1.0 (TID 7, hpe03, executor 2, partition 4, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 6.0 in stage 1.0 (TID 8, hpe01, executor 1, partition 6, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 5.0 in stage 1.0 (TID 9, hpe02, executor 3, partition 5, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 7.0 in stage 1.0 (TID 10, hpe03, executor 2, partition 7, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 8.0 in stage 1.0 (TID 11, hpe02, executor 3, partition 8, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Starting task 9.0 in stage 1.0 (TID 12, hpe03, executor 2, partition 9, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:13 INFO BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe02:35731 (size: 8.1 KB, free: 10.5 GB) 2018-05-04 11:10:13 INFO BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe01:40295 (size: 8.1 KB, free: 10.5 GB) 2018-05-04 11:10:13 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 6221 ms on hpe03 (executor 2) (1/3) 2018-05-04 11:10:13 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe02:35731 (size: 23.2 KB, free: 10.5 GB) 2018-05-04 11:10:13 INFO BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe03:45793 (size: 8.1 KB, free: 10.5 GB) 2018-05-04 11:10:13 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe01:40295 (size: 23.2 KB, free: 10.5 GB) 2018-05-04 11:10:13 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe03:45793 (size: 23.2 KB, free: 10.5 GB) 2018-05-04 11:10:14 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 6956 ms on hpe02 (executor 3) (2/3) 2018-05-04 11:10:15 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe02:35731 (size: 23.2 KB, free: 10.5 GB) 2018-05-04 11:10:15 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe03:45793 (size: 23.2 KB, free: 10.5 GB) 2018-05-04 11:10:15 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe01:40295 (size: 23.2 KB, free: 10.5 GB) 2018-05-04 11:10:16 WARN TaskSetManager:66 - Lost task 6.0 in stage 1.0 (TID 8, hpe01, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000002/pyspark.zip/pyspark/worker.py", line 229, in main process() File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000002/pyspark.zip/pyspark/worker.py", line 224, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 362, in func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 809, in func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/tfspark.zip/tensorflowonspark/TFSparkNode.py", line 395, in _train AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put' at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438) at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-05-04 11:10:16 INFO TaskSetManager:54 - Lost task 3.0 in stage 1.0 (TID 5) on hpe01, executor 1: org.apache.spark.api.python.PythonException (Traceback (most recent call last): File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000002/pyspark.zip/pyspark/worker.py", line 229, in main process() File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000002/pyspark.zip/pyspark/worker.py", line 224, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 362, in func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/pyspark.zip/pyspark/rdd.py", line 809, in func File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0008/container_1525333084924_0008_01_000001/tfspark.zip/tensorflowonspark/TFSparkNode.py", line 395, in _train AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put' ) [duplicate 1] 2018-05-04 11:10:16 INFO TaskSetManager:54 - Starting task 6.1 in stage 1.0 (TID 13, hpe02, executor 3, partition 6, NODE_LOCAL, 8475 bytes) 2018-05-04 11:10:16 INFO TaskSetManager:54 - Starting task 3.1 in stage 1.0 (TID 14, hpe03, executor 2, partition 3, NODE_LOCAL, 8475 bytes)
leewyang commented 6 years ago

@markfengyunzhou Can you try using:

PYSPARK_DRIVER_PYTHON=Python/bin/python3 
PYSPARK_PYTHON=Python/bin/python3 

Also, those are just the driver logs, if you can grab the full yarn logs via: yarn logs -applicationId <your_appId>, you'll be able to see the executor logs (and hopefully some root cause errors).

markfengyunzhou commented 6 years ago

@leewyang

thank you very much for your attention.

PYSPARK_DRIVER_PYTHON=Python/bin/python3 
PYSPARK_PYTHON=Python/bin/python3 

After modification, it does not solve the problem.

and the full yarn logs got by yarn logs -applicationId <your_appId> below.

End of LogType:prelaunch.err.This log file belongs to a running container (container_1525333084924_0016_01_000002) and so may not be complete.
******************************************************************************

Container: container_1525333084924_0016_01_000002 on hpe02:36742
LogAggregationType: LOCAL
================================================================
LogType:stdout
LogLastModifiedTime:星期一 五月 07 09:22:41 +0800 2018
LogLength:8346
LogContents:
2018-05-07 09:22:26 INFO  CoarseGrainedExecutorBackend:2608 - Started daemon with process name: 10871@hpe02
2018-05-07 09:22:26 INFO  SignalUtils:54 - Registered signal handler for TERM
2018-05-07 09:22:26 INFO  SignalUtils:54 - Registered signal handler for HUP
2018-05-07 09:22:26 INFO  SignalUtils:54 - Registered signal handler for INT
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:27 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:27 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:41862 after 120 ms (0 ms spent in bootstraps)
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:28 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:28 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:41862 after 3 ms (0 ms spent in bootstraps)
2018-05-07 09:22:28 INFO  DiskBlockManager:54 - Created local directory at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/blockmgr-74879816-83af-4565-bac0-34a7367fbe7b
2018-05-07 09:22:28 INFO  MemoryStore:54 - MemoryStore started with capacity 10.5 GB
2018-05-07 09:22:28 INFO  CoarseGrainedExecutorBackend:54 - Connecting to driver: spark://CoarseGrainedScheduler@hpe01:41862
2018-05-07 09:22:28 INFO  CoarseGrainedExecutorBackend:54 - Successfully registered with driver
2018-05-07 09:22:28 INFO  Executor:54 - Starting executor ID 1 on host hpe02
2018-05-07 09:22:28 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40311.
2018-05-07 09:22:28 INFO  NettyBlockTransferService:54 - Server created on hpe02:40311
2018-05-07 09:22:28 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-05-07 09:22:28 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(1, hpe02, 40311, None)
2018-05-07 09:22:28 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(1, hpe02, 40311, None)
2018-05-07 09:22:28 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(1, hpe02, 40311, None)
2018-05-07 09:22:30 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 1
2018-05-07 09:22:30 INFO  Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-05-07 09:22:30 INFO  TorrentBroadcast:54 - Started reading broadcast variable 2
2018-05-07 09:22:30 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:38930 after 4 ms (0 ms spent in bootstraps)
2018-05-07 09:22:30 INFO  MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 10.1 KB, free 10.5 GB)
2018-05-07 09:22:30 INFO  TorrentBroadcast:54 - Reading broadcast variable 2 took 173 ms
2018-05-07 09:22:30 INFO  MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 14.3 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 3
2018-05-07 09:22:36 INFO  Executor:54 - Running task 0.0 in stage 1.0 (TID 3)
2018-05-07 09:22:36 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 6
2018-05-07 09:22:36 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 9
2018-05-07 09:22:36 INFO  Executor:54 - Running task 2.0 in stage 1.0 (TID 6)
2018-05-07 09:22:36 INFO  Executor:54 - Running task 5.0 in stage 1.0 (TID 9)
2018-05-07 09:22:36 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 11
2018-05-07 09:22:36 INFO  Executor:54 - Running task 8.0 in stage 1.0 (TID 11)
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Started reading broadcast variable 3
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.0 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Reading broadcast variable 3 took 29 ms
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_3 stored as values in memory (estimated size 14.9 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  PythonRunner:54 - Times: total = 5345, boot = 796, init = 94, finish = 4455
2018-05-07 09:22:36 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00005:0+11173834
2018-05-07 09:22:36 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00002:0+11214784
2018-05-07 09:22:36 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00000:0+9338236
2018-05-07 09:22:36 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00008:0+11194141
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Started reading broadcast variable 0
2018-05-07 09:22:36 INFO  Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 1310 bytes result sent to driver
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Reading broadcast variable 0 took 51 ms
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 324.4 KB, free 10.5 GB)
2018-05-07 09:22:37 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00005:0+245760
2018-05-07 09:22:37 INFO  TorrentBroadcast:54 - Started reading broadcast variable 1
2018-05-07 09:22:37 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00008:0+245760
2018-05-07 09:22:37 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00000:0+204800
2018-05-07 09:22:37 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00002:0+245760
2018-05-07 09:22:37 INFO  MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.2 KB, free 10.5 GB)
2018-05-07 09:22:37 INFO  TorrentBroadcast:54 - Reading broadcast variable 1 took 29 ms
2018-05-07 09:22:37 INFO  MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 324.4 KB, free 10.5 GB)
2018-05-07 09:22:38 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 14
2018-05-07 09:22:38 INFO  Executor:54 - Running task 6.1 in stage 1.0 (TID 14)
2018-05-07 09:22:38 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00006:0+11214285
2018-05-07 09:22:38 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00006:0+245760
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 2566, boot = 26, init = 180, finish = 2360
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 302, boot = 7, init = 182, finish = 113
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 2951, boot = 19, init = 233, finish = 2699
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 311, boot = 17, init = 173, finish = 121
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 3000, boot = 9, init = 240, finish = 2751
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 337, boot = 6, init = 188, finish = 143
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 3020, boot = 32, init = 214, finish = 2774
2018-05-07 09:22:40 INFO  PythonRunner:54 - Times: total = 285, boot = 30, init = 150, finish = 105
2018-05-07 09:22:41 INFO  PythonRunner:54 - Times: total = 2746, boot = 8, init = 8, finish = 2730
2018-05-07 09:22:41 INFO  PythonRunner:54 - Times: total = 175, boot = 9, init = 40, finish = 126
End of LogType:stdout.This log file belongs to a running container (container_1525333084924_0016_01_000002) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000002 on hpe02:36742
LogAggregationType: LOCAL
================================================================
LogType:stderr
LogLastModifiedTime:星期一 五月 07 09:22:39 +0800 2018
LogLength:4793
LogContents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hpe/nm-local-dir/filecache/88/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hpe/hadoop-2.9.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-05-07 09:22:35,356 INFO (MainThread-11016) connected to server at ('192.168.136.158', 42715)
2018-05-07 09:22:35,358 INFO (MainThread-11016) TFSparkNode.reserve: {'executor_id': 1, 'host': '192.168.136.159', 'job_name': 'worker', 'task_index': 0, 'port': 34589, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-_nhkb93k/listener-jcb_p0ze', 'authkey': b'\x98JB\x83M\x91G\x8b\xb54Or\xa4\x85=X'}
2018-05-07 09:22:36,361 INFO (MainThread-11016) node: {'executor_id': 0, 'host': '192.168.136.158', 'job_name': 'ps', 'task_index': 0, 'port': 40395, 'tb_pid': 0, 'tb_port': 0, 'addr': ('192.168.136.158', 42995), 'authkey': b'z\xf6x\xc9\xfe"LM\x87_i3_\xebh\xef'}
2018-05-07 09:22:36,361 INFO (MainThread-11016) node: {'executor_id': 1, 'host': '192.168.136.159', 'job_name': 'worker', 'task_index': 0, 'port': 34589, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-_nhkb93k/listener-jcb_p0ze', 'authkey': b'\x98JB\x83M\x91G\x8b\xb54Or\xa4\x85=X'}
2018-05-07 09:22:36,362 INFO (MainThread-11016) node: {'executor_id': 2, 'host': '192.168.136.160', 'job_name': 'worker', 'task_index': 1, 'port': 45638, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-tm696uno/listener-kmgfbw71', 'authkey': b'\x04\xc69\xdd&aG\x0e\x86\xaa\xfb\x16z\\\xb9\xc7'}
2018-05-07 09:22:36,383 INFO (MainThread-11016) Starting TensorFlow worker:0 as worker on cluster node 1 on background process
2018-05-07 09:22:36,405 INFO (MainThread-11092) 1: ======== worker:0 ========
2018-05-07 09:22:36,406 INFO (MainThread-11092) 1: Cluster spec: {'ps': ['192.168.136.158:40395'], 'worker': ['192.168.136.159:34589', '192.168.136.160:45638']}
2018-05-07 09:22:36,406 INFO (MainThread-11092) 1: Using CPU
2018-05-07 09:22:36.410466: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2018-05-07 09:22:36.418100: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 192.168.136.158:40395}
2018-05-07 09:22:36.418166: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:34589, 1 -> 192.168.136.160:45638}
2018-05-07 09:22:36.421670: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:332] Started server with target: grpc://localhost:34589
tensorflow model path: hdfs://hpe01:9000/user/hpe/mnist_model
2018-05-07 09:22:38,031 INFO (MainThread-11220) Connected to TFSparkNode.mgr on 192.168.136.159, executor=1, state='running'
2018-05-07 09:22:38,041 INFO (MainThread-11217) Connected to TFSparkNode.mgr on 192.168.136.159, executor=1, state='running'
2018-05-07 09:22:38,058 INFO (MainThread-11220) mgr.state='running'
2018-05-07 09:22:38,059 INFO (MainThread-11220) Feeding partition <itertools.chain object at 0x7fc487a68358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fc487a685f8>
2018-05-07 09:22:38,064 INFO (MainThread-11213) Connected to TFSparkNode.mgr on 192.168.136.159, executor=1, state='running'
2018-05-07 09:22:38,068 INFO (MainThread-11227) Connected to TFSparkNode.mgr on 192.168.136.159, executor=1, state='running'
2018-05-07 09:22:38,081 INFO (MainThread-11217) mgr.state='running'
2018-05-07 09:22:38,082 INFO (MainThread-11217) Feeding partition <itertools.chain object at 0x7fc487a68358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fc487a685f8>
2018-05-07 09:22:38,113 INFO (MainThread-11227) mgr.state='running'
2018-05-07 09:22:38,114 INFO (MainThread-11227) Feeding partition <itertools.chain object at 0x7fc487a68358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fc487a685f8>
2018-05-07 09:22:38,117 INFO (MainThread-11213) mgr.state='running'
2018-05-07 09:22:38,118 INFO (MainThread-11213) Feeding partition <itertools.chain object at 0x7fc487a68358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fc487a685f8>
2018-05-07 09:22:39,154 INFO (MainThread-11312) Connected to TFSparkNode.mgr on 192.168.136.159, executor=1, state='running'
2018-05-07 09:22:39,252 INFO (MainThread-11312) mgr.state='running'
2018-05-07 09:22:39,254 INFO (MainThread-11312) Feeding partition <itertools.chain object at 0x7fc487a68358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fc487a685f8>
End of LogType:stderr.This log file belongs to a running container (container_1525333084924_0016_01_000002) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000002 on hpe02:36742
LogAggregationType: LOCAL
================================================================
LogType:prelaunch.out
LogLastModifiedTime:星期一 五月 07 09:22:25 +0800 2018
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_1525333084924_0016_01_000002) and so may not be complete.
******************************************************************************

End of LogType:prelaunch.err.This log file belongs to a running container (container_1525333084924_0016_01_000003) and so may not be complete.
******************************************************************************

Container: container_1525333084924_0016_01_000003 on hpe01:33415
LogAggregationType: LOCAL
================================================================
LogType:stdout
LogLastModifiedTime:星期一 五月 07 09:22:38 +0800 2018
LogLength:13592
LogContents:
2018-05-07 09:22:26 INFO  CoarseGrainedExecutorBackend:2608 - Started daemon with process name: 27768@hpe01
2018-05-07 09:22:26 INFO  SignalUtils:54 - Registered signal handler for TERM
2018-05-07 09:22:26 INFO  SignalUtils:54 - Registered signal handler for HUP
2018-05-07 09:22:26 INFO  SignalUtils:54 - Registered signal handler for INT
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:27 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:27 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:27 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:41862 after 166 ms (0 ms spent in bootstraps)
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:28 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:28 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:28 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:41862 after 3 ms (0 ms spent in bootstraps)
2018-05-07 09:22:28 INFO  DiskBlockManager:54 - Created local directory at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/blockmgr-1f36d212-3c46-453e-bcfb-d6c6b45aec3f
2018-05-07 09:22:28 INFO  MemoryStore:54 - MemoryStore started with capacity 10.5 GB
2018-05-07 09:22:28 INFO  CoarseGrainedExecutorBackend:54 - Connecting to driver: spark://CoarseGrainedScheduler@hpe01:41862
2018-05-07 09:22:28 INFO  CoarseGrainedExecutorBackend:54 - Successfully registered with driver
2018-05-07 09:22:28 INFO  Executor:54 - Starting executor ID 2 on host hpe01
2018-05-07 09:22:28 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33489.
2018-05-07 09:22:28 INFO  NettyBlockTransferService:54 - Server created on hpe01:33489
2018-05-07 09:22:28 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-05-07 09:22:28 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(2, hpe01, 33489, None)
2018-05-07 09:22:28 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(2, hpe01, 33489, None)
2018-05-07 09:22:28 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(2, hpe01, 33489, None)
2018-05-07 09:22:30 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 0
2018-05-07 09:22:30 INFO  Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-05-07 09:22:30 INFO  TorrentBroadcast:54 - Started reading broadcast variable 2
2018-05-07 09:22:30 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:38930 after 3 ms (0 ms spent in bootstraps)
2018-05-07 09:22:30 INFO  MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 10.1 KB, free 10.5 GB)
2018-05-07 09:22:30 INFO  TorrentBroadcast:54 - Reading broadcast variable 2 took 150 ms
2018-05-07 09:22:31 INFO  MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 14.3 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 5
2018-05-07 09:22:36 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 8
2018-05-07 09:22:36 INFO  Executor:54 - Running task 3.0 in stage 1.0 (TID 5)
2018-05-07 09:22:36 INFO  Executor:54 - Running task 6.0 in stage 1.0 (TID 8)
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Started reading broadcast variable 3
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.0 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Reading broadcast variable 3 took 28 ms
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_3 stored as values in memory (estimated size 14.9 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00003:0+11226100
2018-05-07 09:22:36 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00006:0+11214285
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Started reading broadcast variable 0
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 10.5 GB)
2018-05-07 09:22:36 INFO  TorrentBroadcast:54 - Reading broadcast variable 0 took 26 ms
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 324.4 KB, free 10.5 GB)
2018-05-07 09:22:37 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00003:0+245760
2018-05-07 09:22:37 INFO  TorrentBroadcast:54 - Started reading broadcast variable 1
2018-05-07 09:22:37 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00006:0+245760
2018-05-07 09:22:37 INFO  MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.2 KB, free 10.5 GB)
2018-05-07 09:22:37 INFO  TorrentBroadcast:54 - Reading broadcast variable 1 took 27 ms
2018-05-07 09:22:37 INFO  MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 324.4 KB, free 10.5 GB)
2018-05-07 09:22:38 ERROR Executor:91 - Exception in task 6.0 in stage 1.0 (TID 8)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 229, in main
    process()
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 224, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 362, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 809, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFSparkNode.py", line 394, in _train
AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put'

    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
    at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
    at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
    at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2018-05-07 09:22:38 ERROR Executor:91 - Exception in task 3.0 in stage 1.0 (TID 5)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 229, in main
    process()
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 224, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 362, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 809, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFSparkNode.py", line 394, in _train
AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put'

    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
    at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
    at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
    at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
End of LogType:stdout.This log file belongs to a running container (container_1525333084924_0016_01_000003) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000003 on hpe01:33415
LogAggregationType: LOCAL
================================================================
LogType:stderr
LogLastModifiedTime:星期一 五月 07 09:22:42 +0800 2018
LogLength:3433
LogContents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hpe/nm-local-dir/filecache/87/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hpe/hadoop-2.9.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-05-07 09:22:35,033 INFO (MainThread-27925) connected to server at ('192.168.136.158', 42715)
2018-05-07 09:22:35,036 INFO (MainThread-27925) TFSparkNode.reserve: {'executor_id': 0, 'host': '192.168.136.158', 'job_name': 'ps', 'task_index': 0, 'port': 40395, 'tb_pid': 0, 'tb_port': 0, 'addr': ('192.168.136.158', 42995), 'authkey': b'z\xf6x\xc9\xfe"LM\x87_i3_\xebh\xef'}
2018-05-07 09:22:37,078 INFO (MainThread-27925) node: {'executor_id': 0, 'host': '192.168.136.158', 'job_name': 'ps', 'task_index': 0, 'port': 40395, 'tb_pid': 0, 'tb_port': 0, 'addr': ('192.168.136.158', 42995), 'authkey': b'z\xf6x\xc9\xfe"LM\x87_i3_\xebh\xef'}
2018-05-07 09:22:37,078 INFO (MainThread-27925) node: {'executor_id': 1, 'host': '192.168.136.159', 'job_name': 'worker', 'task_index': 0, 'port': 34589, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-_nhkb93k/listener-jcb_p0ze', 'authkey': b'\x98JB\x83M\x91G\x8b\xb54Or\xa4\x85=X'}
2018-05-07 09:22:37,078 INFO (MainThread-27925) node: {'executor_id': 2, 'host': '192.168.136.160', 'job_name': 'worker', 'task_index': 1, 'port': 45638, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-tm696uno/listener-kmgfbw71', 'authkey': b'\x04\xc69\xdd&aG\x0e\x86\xaa\xfb\x16z\\\xb9\xc7'}
2018-05-07 09:22:37,098 INFO (MainThread-27925) Starting TensorFlow ps:0 as ps on cluster node 0 on background process
2018-05-07 09:22:38,151 INFO (MainThread-28031) Connected to TFSparkNode.mgr on 192.168.136.158, executor=0, state='running'
2018-05-07 09:22:38,152 INFO (MainThread-28034) Connected to TFSparkNode.mgr on 192.168.136.158, executor=0, state='running'
2018-05-07 09:22:38,619 INFO (MainThread-28031) mgr.state='running'
2018-05-07 09:22:38,619 INFO (MainThread-28034) mgr.state='running'
2018-05-07 09:22:38,619 INFO (MainThread-28031) Feeding partition <itertools.chain object at 0x7f0e05cd2860> into input queue None
2018-05-07 09:22:38,620 INFO (MainThread-28034) Feeding partition <itertools.chain object at 0x7f0e05cd2860> into input queue None
2018-05-07 09:22:42,276 INFO (MainThread-28001) 0: ======== ps:0 ========
2018-05-07 09:22:42,277 INFO (MainThread-28001) 0: Cluster spec: {'ps': ['192.168.136.158:40395'], 'worker': ['192.168.136.159:34589', '192.168.136.160:45638']}
2018-05-07 09:22:42,277 INFO (MainThread-28001) 0: Using CPU
2018-05-07 09:22:42.280937: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2018-05-07 09:22:42.290145: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> localhost:40395}
2018-05-07 09:22:42.290232: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 192.168.136.159:34589, 1 -> 192.168.136.160:45638}
2018-05-07 09:22:42.295658: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:333] Started server with target: grpc://localhost:40395
End of LogType:stderr.This log file belongs to a running container (container_1525333084924_0016_01_000003) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000003 on hpe01:33415
LogAggregationType: LOCAL
================================================================
LogType:prelaunch.out
LogLastModifiedTime:星期一 五月 07 09:22:25 +0800 2018
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_1525333084924_0016_01_000003) and so may not be complete.
******************************************************************************

End of LogType:prelaunch.err.This log file belongs to a running container (container_1525333084924_0016_01_000004) and so may not be complete.
******************************************************************************

Container: container_1525333084924_0016_01_000004 on hpe03:35000
LogAggregationType: LOCAL
================================================================
LogType:stdout
LogLastModifiedTime:星期一 五月 07 09:23:03 +0800 2018
LogLength:8508
LogContents:
2018-05-07 09:22:48 INFO  CoarseGrainedExecutorBackend:2608 - Started daemon with process name: 10107@hpe03
2018-05-07 09:22:48 INFO  SignalUtils:54 - Registered signal handler for TERM
2018-05-07 09:22:48 INFO  SignalUtils:54 - Registered signal handler for HUP
2018-05-07 09:22:48 INFO  SignalUtils:54 - Registered signal handler for INT
2018-05-07 09:22:49 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:49 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:49 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:49 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:49 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:50 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:41862 after 121 ms (0 ms spent in bootstraps)
2018-05-07 09:22:50 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:50 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:50 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:50 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:50 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:50 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:41862 after 3 ms (0 ms spent in bootstraps)
2018-05-07 09:22:50 INFO  DiskBlockManager:54 - Created local directory at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/blockmgr-7bf15713-3c5b-4958-b839-ab72638a9a53
2018-05-07 09:22:50 INFO  MemoryStore:54 - MemoryStore started with capacity 10.5 GB
2018-05-07 09:22:50 INFO  CoarseGrainedExecutorBackend:54 - Connecting to driver: spark://CoarseGrainedScheduler@hpe01:41862
2018-05-07 09:22:50 INFO  CoarseGrainedExecutorBackend:54 - Successfully registered with driver
2018-05-07 09:22:50 INFO  Executor:54 - Starting executor ID 3 on host hpe03
2018-05-07 09:22:50 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38332.
2018-05-07 09:22:50 INFO  NettyBlockTransferService:54 - Server created on hpe03:38332
2018-05-07 09:22:50 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-05-07 09:22:50 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(3, hpe03, 38332, None)
2018-05-07 09:22:51 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(3, hpe03, 38332, None)
2018-05-07 09:22:51 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(3, hpe03, 38332, None)
2018-05-07 09:22:52 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 2
2018-05-07 09:22:52 INFO  Executor:54 - Running task 2.0 in stage 0.0 (TID 2)
2018-05-07 09:22:52 INFO  TorrentBroadcast:54 - Started reading broadcast variable 2
2018-05-07 09:22:52 INFO  TransportClientFactory:267 - Successfully created connection to hpe01/192.168.136.158:38930 after 4 ms (0 ms spent in bootstraps)
2018-05-07 09:22:52 INFO  MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 10.1 KB, free 10.5 GB)
2018-05-07 09:22:52 INFO  TorrentBroadcast:54 - Reading broadcast variable 2 took 173 ms
2018-05-07 09:22:52 INFO  MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 14.3 KB, free 10.5 GB)
2018-05-07 09:22:57 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 4
2018-05-07 09:22:57 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 7
2018-05-07 09:22:57 INFO  Executor:54 - Running task 1.0 in stage 1.0 (TID 4)
2018-05-07 09:22:57 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 10
2018-05-07 09:22:57 INFO  Executor:54 - Running task 4.0 in stage 1.0 (TID 7)
2018-05-07 09:22:57 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 12
2018-05-07 09:22:57 INFO  Executor:54 - Running task 7.0 in stage 1.0 (TID 10)
2018-05-07 09:22:57 INFO  Executor:54 - Running task 9.0 in stage 1.0 (TID 12)
2018-05-07 09:22:57 INFO  TorrentBroadcast:54 - Started reading broadcast variable 3
2018-05-07 09:22:57 INFO  TransportClientFactory:267 - Successfully created connection to hpe02/192.168.136.159:40311 after 5 ms (0 ms spent in bootstraps)
2018-05-07 09:22:58 INFO  MemoryStore:54 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.0 KB, free 10.5 GB)
2018-05-07 09:22:58 INFO  TorrentBroadcast:54 - Reading broadcast variable 3 took 136 ms
2018-05-07 09:22:58 INFO  MemoryStore:54 - Block broadcast_3 stored as values in memory (estimated size 14.9 KB, free 10.5 GB)
2018-05-07 09:22:58 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00007:0+11201024
2018-05-07 09:22:58 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00001:0+11231804
2018-05-07 09:22:58 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00009:0+10449019
2018-05-07 09:22:58 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00004:0+11212767
2018-05-07 09:22:58 INFO  TorrentBroadcast:54 - Started reading broadcast variable 0
2018-05-07 09:22:58 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 10.5 GB)
2018-05-07 09:22:58 INFO  TorrentBroadcast:54 - Reading broadcast variable 0 took 27 ms
2018-05-07 09:22:58 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 324.4 KB, free 10.5 GB)
2018-05-07 09:22:58 INFO  PythonRunner:54 - Times: total = 6123, boot = 743, init = 80, finish = 5300
2018-05-07 09:22:58 INFO  Executor:54 - Finished task 2.0 in stage 0.0 (TID 2). 1353 bytes result sent to driver
2018-05-07 09:22:59 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00007:0+245760
2018-05-07 09:22:59 INFO  TorrentBroadcast:54 - Started reading broadcast variable 1
2018-05-07 09:22:59 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00009:0+229120
2018-05-07 09:22:59 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00001:0+245760
2018-05-07 09:22:59 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00004:0+245760
2018-05-07 09:22:59 INFO  MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.2 KB, free 10.5 GB)
2018-05-07 09:22:59 INFO  TorrentBroadcast:54 - Reading broadcast variable 1 took 35 ms
2018-05-07 09:22:59 INFO  MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 324.4 KB, free 10.5 GB)
2018-05-07 09:23:00 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 13
2018-05-07 09:23:00 INFO  Executor:54 - Running task 3.1 in stage 1.0 (TID 13)
2018-05-07 09:23:00 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/images/part-00003:0+11226100
2018-05-07 09:23:00 INFO  HadoopRDD:54 - Input split: hdfs://hpe01:9000/data/mnist/csv/train/labels/part-00003:0+245760
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 2715, boot = 17, init = 160, finish = 2538
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 248, boot = 9, init = 130, finish = 109
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 2801, boot = 10, init = 176, finish = 2615
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 269, boot = 7, init = 141, finish = 121
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 2847, boot = 23, init = 143, finish = 2681
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 263, boot = 18, init = 121, finish = 124
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 2862, boot = 29, init = 181, finish = 2652
2018-05-07 09:23:02 INFO  PythonRunner:54 - Times: total = 260, boot = 16, init = 114, finish = 130
2018-05-07 09:23:03 INFO  PythonRunner:54 - Times: total = 2435, boot = 9, init = 8, finish = 2418
2018-05-07 09:23:03 INFO  PythonRunner:54 - Times: total = 161, boot = 10, init = 10, finish = 141
End of LogType:stdout.This log file belongs to a running container (container_1525333084924_0016_01_000004) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000004 on hpe03:35000
LogAggregationType: LOCAL
================================================================
LogType:stderr
LogLastModifiedTime:星期一 五月 07 09:30:41 +0800 2018
LogLength:13512
LogContents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hpe/nm-local-dir/filecache/87/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hpe/hadoop-2.9.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-05-07 09:22:56,709 INFO (MainThread-10250) connected to server at ('192.168.136.158', 42715)
2018-05-07 09:22:56,711 INFO (MainThread-10250) TFSparkNode.reserve: {'executor_id': 2, 'host': '192.168.136.160', 'job_name': 'worker', 'task_index': 1, 'port': 45638, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-tm696uno/listener-kmgfbw71', 'authkey': b'\x04\xc69\xdd&aG\x0e\x86\xaa\xfb\x16z\\\xb9\xc7'}
2018-05-07 09:22:58,793 INFO (MainThread-10250) node: {'executor_id': 0, 'host': '192.168.136.158', 'job_name': 'ps', 'task_index': 0, 'port': 40395, 'tb_pid': 0, 'tb_port': 0, 'addr': ('192.168.136.158', 42995), 'authkey': b'z\xf6x\xc9\xfe"LM\x87_i3_\xebh\xef'}
2018-05-07 09:22:58,793 INFO (MainThread-10250) node: {'executor_id': 1, 'host': '192.168.136.159', 'job_name': 'worker', 'task_index': 0, 'port': 34589, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-_nhkb93k/listener-jcb_p0ze', 'authkey': b'\x98JB\x83M\x91G\x8b\xb54Or\xa4\x85=X'}
2018-05-07 09:22:58,793 INFO (MainThread-10250) node: {'executor_id': 2, 'host': '192.168.136.160', 'job_name': 'worker', 'task_index': 1, 'port': 45638, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-tm696uno/listener-kmgfbw71', 'authkey': b'\x04\xc69\xdd&aG\x0e\x86\xaa\xfb\x16z\\\xb9\xc7'}
2018-05-07 09:22:58,810 INFO (MainThread-10250) Starting TensorFlow worker:1 as worker on cluster node 2 on background process
2018-05-07 09:22:58,830 INFO (MainThread-10335) 2: ======== worker:1 ========
2018-05-07 09:22:58,831 INFO (MainThread-10335) 2: Cluster spec: {'ps': ['192.168.136.158:40395'], 'worker': ['192.168.136.159:34589', '192.168.136.160:45638']}
2018-05-07 09:22:58,831 INFO (MainThread-10335) 2: Using CPU
2018-05-07 09:22:58.833362: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2018-05-07 09:22:58.839880: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 192.168.136.158:40395}
2018-05-07 09:22:58.839934: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> 192.168.136.159:34589, 1 -> localhost:45638}
2018-05-07 09:22:58.843093: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:332] Started server with target: grpc://localhost:45638
tensorflow model path: hdfs://hpe01:9000/user/hpe/mnist_model
2018-05-07 09:22:59,378 INFO (MainThread-10335) Graph was finalized.
2018-05-07 09:22:59,703 INFO (MainThread-10463) Connected to TFSparkNode.mgr on 192.168.136.160, executor=2, state='running'
2018-05-07 09:22:59,710 INFO (MainThread-10452) Connected to TFSparkNode.mgr on 192.168.136.160, executor=2, state='running'
2018-05-07 09:22:59,724 INFO (MainThread-10469) Connected to TFSparkNode.mgr on 192.168.136.160, executor=2, state='running'
2018-05-07 09:22:59,731 INFO (MainThread-10466) Connected to TFSparkNode.mgr on 192.168.136.160, executor=2, state='running'
2018-05-07 09:22:59,745 INFO (MainThread-10463) mgr.state='running'
2018-05-07 09:22:59,746 INFO (MainThread-10463) Feeding partition <itertools.chain object at 0x7fb7d98ea358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fb7d98ea5f8>
2018-05-07 09:22:59,747 INFO (MainThread-10452) mgr.state='running'
2018-05-07 09:22:59,748 INFO (MainThread-10452) Feeding partition <itertools.chain object at 0x7fb7d98ea358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fb7d98ea5f8>
2018-05-07 09:22:59,764 INFO (MainThread-10469) mgr.state='running'
2018-05-07 09:22:59,765 INFO (MainThread-10469) Feeding partition <itertools.chain object at 0x7fb7d98ea358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fb7d98ea5f8>
2018-05-07 09:22:59,778 INFO (MainThread-10466) mgr.state='running'
2018-05-07 09:22:59,780 INFO (MainThread-10466) Feeding partition <itertools.chain object at 0x7fb7d98ea358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fb7d98ea5f8>
2018-05-07 09:23:00,786 INFO (MainThread-10553) Connected to TFSparkNode.mgr on 192.168.136.160, executor=2, state='running'
2018-05-07 09:23:00,830 INFO (MainThread-10553) mgr.state='running'
2018-05-07 09:23:00,832 INFO (MainThread-10553) Feeding partition <itertools.chain object at 0x7fb7d98ea358> into input queue <multiprocessing.queues.JoinableQueue object at 0x7fb7d98ea5f8>
2018-05-07 09:23:04.392866: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session c7cea3785511f7ee with config: 
2018-05-07 09:23:04,513 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:23:34.786975: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session e6f1c776525e4380 with config: 
2018-05-07 09:23:34,843 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:24:04.881590: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session fb79d5bb5075c753 with config: 
2018-05-07 09:24:04,935 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:24:35.165303: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session bcde1636c5140cc2 with config: 
2018-05-07 09:24:35,218 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:25:05.968857: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session c538e315143d99c6 with config: 
2018-05-07 09:25:06,021 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:25:36.028765: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session f56ff319b6085cc0 with config: 
2018-05-07 09:25:36,077 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:26:07.098008: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 3773bb2871300c8c with config: 
2018-05-07 09:26:07,152 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:26:37.504513: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 9eff0976308830d6 with config: 
2018-05-07 09:26:37,555 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:27:07.655913: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 7779a1b6b86bc58b with config: 
2018-05-07 09:27:07,709 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:27:37.931022: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session dbadd3b944776685 with config: 
2018-05-07 09:27:37,984 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:28:08.910850: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session a8e2f40c794437dc with config: 
2018-05-07 09:28:08,960 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:28:39.453483: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 9135cc65a1150abd with config: 
2018-05-07 09:28:39,503 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:29:09.808256: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 7397c55ff304c069 with config: 
2018-05-07 09:29:09,862 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:29:40.492260: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session a9557c044290e432 with config: 
2018-05-07 09:29:40,541 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:30:10.867679: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session 863652ccb1f92ec3 with config: 
2018-05-07 09:30:10,917 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
2018-05-07 09:30:41.410422: I tensorflow/core/distributed_runtime/master_session.cc:1136] Start master session cb54d3f85fb4596e with config: 
2018-05-07 09:30:41,464 INFO (MainThread-10335) Waiting for model to be ready.  Ready_for_local_init_op:  Variables not initialized: layer/hidden_layer/hid_w, layer/hidden_layer/hid_b, layer/softmax_layer/sm_w, layer/softmax_layer/sm_b, global_step, layer/hidden_layer/hid_w/Adagrad, layer/hidden_layer/hid_b/Adagrad, layer/softmax_layer/sm_w/Adagrad, layer/softmax_layer/sm_b/Adagrad, ready: None
End of LogType:stderr.This log file belongs to a running container (container_1525333084924_0016_01_000004) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000004 on hpe03:35000
LogAggregationType: LOCAL
================================================================
LogType:prelaunch.out
LogLastModifiedTime:星期一 五月 07 09:22:47 +0800 2018
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_1525333084924_0016_01_000004) and so may not be complete.
******************************************************************************

End of LogType:prelaunch.err.This log file belongs to a running container (container_1525333084924_0016_01_000001) and so may not be complete.
******************************************************************************

Container: container_1525333084924_0016_01_000001 on hpe01:33415
LogAggregationType: LOCAL
================================================================
LogType:stdout
LogLastModifiedTime:星期一 五月 07 09:22:38 +0800 2018
LogLength:79227
LogContents:
2018-05-07 09:22:16 INFO  SignalUtils:54 - Registered signal handler for TERM
2018-05-07 09:22:16 INFO  SignalUtils:54 - Registered signal handler for HUP
2018-05-07 09:22:16 INFO  SignalUtils:54 - Registered signal handler for INT
2018-05-07 09:22:17 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:17 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:17 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:17 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:17 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:17 INFO  ApplicationMaster:54 - Preparing Local resources
2018-05-07 09:22:19 INFO  ApplicationMaster:54 - ApplicationAttemptId: appattempt_1525333084924_0016_000001
2018-05-07 09:22:19 INFO  ApplicationMaster:54 - Starting the user application in a separate Thread
2018-05-07 09:22:19 INFO  ApplicationMaster:54 - Waiting for spark context initialization...
2018-05-07 09:22:21 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-05-07 09:22:21 INFO  SparkContext:54 - Submitted application: mnist_spark
2018-05-07 09:22:21 INFO  SecurityManager:54 - Changing view acls to: hpe
2018-05-07 09:22:21 INFO  SecurityManager:54 - Changing modify acls to: hpe
2018-05-07 09:22:21 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-05-07 09:22:21 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-05-07 09:22:21 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hpe); groups with view permissions: Set(); users  with modify permissions: Set(hpe); groups with modify permissions: Set()
2018-05-07 09:22:21 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 41862.
2018-05-07 09:22:21 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-05-07 09:22:21 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-05-07 09:22:21 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-05-07 09:22:21 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-05-07 09:22:21 INFO  DiskBlockManager:54 - Created local directory at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/blockmgr-820f86be-5e2f-4a4d-a753-c18c4ab60867
2018-05-07 09:22:21 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2018-05-07 09:22:21 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-05-07 09:22:21 INFO  log:192 - Logging initialized @6393ms
2018-05-07 09:22:21 INFO  JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2018-05-07 09:22:21 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-05-07 09:22:21 INFO  Server:414 - Started @6524ms
2018-05-07 09:22:21 INFO  AbstractConnector:278 - Started ServerConnector@4ce4051c{HTTP/1.1,[http/1.1]}{0.0.0.0:45576}
2018-05-07 09:22:21 INFO  Utils:54 - Successfully started service 'SparkUI' on port 45576.
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@37055cf6{/jobs,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@149026db{/jobs/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4d05ec8a{/jobs/job,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@40c1b1aa{/jobs/job/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@55f2f197{/stages,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3771261f{/stages/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@eccfa76{/stages/stage,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3f8ee6ac{/stages/stage/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@56f8b461{/stages/pool,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7fab322c{/stages/pool/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@751af76b{/storage,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3de9c4e2{/storage/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2e212482{/storage/rdd,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@8bd5938{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2abe56eb{/environment,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5903598a{/environment/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7ea1731c{/executors,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1c379169{/executors/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4466850a{/executors/threadDump,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@76a5b05{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7226f167{/static,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6f7b00d7{/,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3b9b078e{/api,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5aa0cd22{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27ed3bbb{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://hpe01:45576
2018-05-07 09:22:22 INFO  YarnClusterScheduler:54 - Created YarnClusterScheduler
2018-05-07 09:22:22 INFO  SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1525333084924_0016 and attemptId Some(appattempt_1525333084924_0016_000001)
2018-05-07 09:22:22 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38930.
2018-05-07 09:22:22 INFO  NettyBlockTransferService:54 - Server created on hpe01:38930
2018-05-07 09:22:22 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-05-07 09:22:22 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, hpe01, 38930, None)
2018-05-07 09:22:22 INFO  BlockManagerMasterEndpoint:54 - Registering block manager hpe01:38930 with 366.3 MB RAM, BlockManagerId(driver, hpe01, 38930, None)
2018-05-07 09:22:22 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, hpe01, 38930, None)
2018-05-07 09:22:22 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, hpe01, 38930, None)
2018-05-07 09:22:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7d2fd91a{/metrics/json,null,AVAILABLE,@Spark}
2018-05-07 09:22:22 INFO  ApplicationMaster:54 - 
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
    SPARK_YARN_STAGING_DIR -> *********(redacted)
    SPARK_USER -> *********(redacted)
    PYTHONPATH -> {{PWD}}/__pyfiles__<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.6-src.zip<CPS>{{PWD}}/tfspark.zip

  command:
    {{JAVA_HOME}}/bin/java \ 
      -server \ 
      -Xmx20480m \ 
      -Djava.io.tmpdir={{PWD}}/tmp \ 
      -Dspark.yarn.app.container.log.dir=<LOG_DIR> \ 
      -XX:OnOutOfMemoryError='kill %p' \ 
      org.apache.spark.executor.CoarseGrainedExecutorBackend \ 
      --driver-url \ 
      spark://CoarseGrainedScheduler@hpe01:41862 \ 
      --executor-id \ 
      <executorId> \ 
      --hostname \ 
      <hostname> \ 
      --cores \ 
      8 \ 
      --app-id \ 
      application_1525333084924_0016 \ 
      --user-class-path \ 
      file:$PWD/__app__.jar \ 
      --user-class-path \ 
      file:$PWD/tensorflow-hadoop-1.6.0.jar \ 
      1><LOG_DIR>/stdout \ 
      2><LOG_DIR>/stderr

  resources:
    __spark_libs__/scala-xml_2.11-1.0.5.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/scala-xml_2.11-1.0.5.jar" } size: 671138 timestamp: 1525655757989 type: FILE visibility: PUBLIC
    __spark_libs__/xbean-asm5-shaded-4.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/xbean-asm5-shaded-4.4.jar" } size: 144660 timestamp: 1525655759331 type: FILE visibility: PUBLIC
    __spark_libs__/apache-log4j-extras-1.2.17.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/apache-log4j-extras-1.2.17.jar" } size: 448794 timestamp: 1525655751722 type: FILE visibility: PUBLIC
    __spark_libs__/base64-2.3.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/base64-2.3.8.jar" } size: 17008 timestamp: 1525655752098 type: FILE visibility: PUBLIC
    __spark_libs__/scala-library-2.11.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/scala-library-2.11.8.jar" } size: 5744974 timestamp: 1525655757849 type: FILE visibility: PUBLIC
    __spark_libs__/javolution-5.5.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/javolution-5.5.1.jar" } size: 395195 timestamp: 1525655755215 type: FILE visibility: PUBLIC
    __spark_libs__/hk2-locator-2.4.0-b34.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hk2-locator-2.4.0-b34.jar" } size: 181271 timestamp: 1525655754502 type: FILE visibility: PUBLIC
    __spark_libs__/javax.ws.rs-api-2.0.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/javax.ws.rs-api-2.0.1.jar" } size: 115534 timestamp: 1525655755189 type: FILE visibility: PUBLIC
    __spark_libs__/parquet-encoding-1.8.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/parquet-encoding-1.8.2.jar" } size: 286071 timestamp: 1525655757401 type: FILE visibility: PUBLIC
    __spark_libs__/compress-lzf-1.0.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/compress-lzf-1.0.3.jar" } size: 79845 timestamp: 1525655753117 type: FILE visibility: PUBLIC
    __spark_libs__/java-xmlbuilder-1.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/java-xmlbuilder-1.1.jar" } size: 17316 timestamp: 1525655755142 type: FILE visibility: PUBLIC
    __spark_libs__/parquet-jackson-1.8.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/parquet-jackson-1.8.2.jar" } size: 1048116 timestamp: 1525655757529 type: FILE visibility: PUBLIC
    __spark_libs__/guava-14.0.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/guava-14.0.1.jar" } size: 2189117 timestamp: 1525655753561 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-module-jaxb-annotations-2.6.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-module-jaxb-annotations-2.6.7.jar" } size: 32612 timestamp: 1525655754899 type: FILE visibility: PUBLIC
    __spark_libs__/javax.annotation-api-1.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/javax.annotation-api-1.2.jar" } size: 26366 timestamp: 1525655755075 type: FILE visibility: PUBLIC
    __spark_libs__/json4s-ast_2.11-3.2.11.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/json4s-ast_2.11-3.2.11.jar" } size: 82421 timestamp: 1525655755679 type: FILE visibility: PUBLIC
    __spark_libs__/breeze_2.11-0.13.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/breeze_2.11-0.13.2.jar" } size: 15113382 timestamp: 1525655752348 type: FILE visibility: PUBLIC
    __spark_libs__/httpclient-4.5.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/httpclient-4.5.4.jar" } size: 781831 timestamp: 1525655754628 type: FILE visibility: PUBLIC
    __spark_libs__/logging-interceptor-3.8.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/logging-interceptor-3.8.1.jar" } size: 8268 timestamp: 1525655756076 type: FILE visibility: PUBLIC
    __spark_libs__/stream-2.7.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/stream-2.7.0.jar" } size: 174351 timestamp: 1525655759228 type: FILE visibility: PUBLIC
    __spark_libs__/commons-compress-1.4.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-compress-1.4.1.jar" } size: 241367 timestamp: 1525655752744 type: FILE visibility: PUBLIC
    __spark_libs__/spark-tags_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-tags_2.11-2.3.0.jar" } size: 15456 timestamp: 1525655758968 type: FILE visibility: PUBLIC
    __spark_libs__/commons-math3-3.4.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-math3-3.4.1.jar" } size: 2035066 timestamp: 1525655753029 type: FILE visibility: PUBLIC
    __spark_libs__/hk2-api-2.4.0-b34.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hk2-api-2.4.0-b34.jar" } size: 178947 timestamp: 1525655754474 type: FILE visibility: PUBLIC
    __spark_libs__/commons-beanutils-1.7.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-beanutils-1.7.0.jar" } size: 188671 timestamp: 1525655752583 type: FILE visibility: PUBLIC
    __spark_libs__/derby-10.12.1.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/derby-10.12.1.1.jar" } size: 3224708 timestamp: 1525655753409 type: FILE visibility: PUBLIC
    __spark_libs__/spark-unsafe_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-unsafe_2.11-2.3.0.jar" } size: 48464 timestamp: 1525655758987 type: FILE visibility: PUBLIC
    tensorflow-hadoop-1.6.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0016/tensorflow-hadoop-1.6.0.jar" } size: 123665 timestamp: 1525656132874 type: FILE visibility: PRIVATE
    __spark_libs__/jersey-client-2.22.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jersey-client-2.22.2.jar" } size: 167421 timestamp: 1525655755306 type: FILE visibility: PUBLIC
    __spark_libs__/hive-cli-1.2.1.spark2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hive-cli-1.2.1.spark2.jar" } size: 40817 timestamp: 1525655754214 type: FILE visibility: PUBLIC
    __spark_libs__/antlr-2.7.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/antlr-2.7.7.jar" } size: 445288 timestamp: 1525655751478 type: FILE visibility: PUBLIC
    __spark_libs__/hive-metastore-1.2.1.spark2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hive-metastore-1.2.1.spark2.jar" } size: 5505200 timestamp: 1525655754446 type: FILE visibility: PUBLIC
    py4j-0.10.6-src.zip -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0016/py4j-0.10.6-src.zip" } size: 80352 timestamp: 1525656132978 type: FILE visibility: PRIVATE
    __spark_libs__/jackson-module-paranamer-2.7.9.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-module-paranamer-2.7.9.jar" } size: 42858 timestamp: 1525655754922 type: FILE visibility: PUBLIC
    __spark_libs__/macro-compat_2.11-1.1.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/macro-compat_2.11-1.1.1.jar" } size: 3142 timestamp: 1525655756139 type: FILE visibility: PUBLIC
    __spark_libs__/okhttp-3.8.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/okhttp-3.8.1.jar" } size: 398122 timestamp: 1525655757145 type: FILE visibility: PUBLIC
    __spark_libs__/curator-recipes-2.7.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/curator-recipes-2.7.1.jar" } size: 270342 timestamp: 1525655753239 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-mapper-asl-1.9.13.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-mapper-asl-1.9.13.jar" } size: 780664 timestamp: 1525655754875 type: FILE visibility: PUBLIC
    __spark_libs__/commons-net-2.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-net-2.2.jar" } size: 212453 timestamp: 1525655753057 type: FILE visibility: PUBLIC
    __spark_libs__/commons-lang3-3.5.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-lang3-3.5.jar" } size: 479881 timestamp: 1525655752961 type: FILE visibility: PUBLIC
    __spark_libs__/oro-2.0.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/oro-2.0.8.jar" } size: 65261 timestamp: 1525655757283 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-core-asl-1.9.13.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-core-asl-1.9.13.jar" } size: 232248 timestamp: 1525655754765 type: FILE visibility: PUBLIC
    __spark_libs__/jta-1.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jta-1.1.jar" } size: 15071 timestamp: 1525655755785 type: FILE visibility: PUBLIC
    __spark_libs__/jetty-util-6.1.26.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jetty-util-6.1.26.jar" } size: 177131 timestamp: 1525655755557 type: FILE visibility: PUBLIC
    __spark_libs__/gson-2.2.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/gson-2.2.4.jar" } size: 190432 timestamp: 1525655753516 type: FILE visibility: PUBLIC
    __spark_libs__/kryo-shaded-3.0.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/kryo-shaded-3.0.3.jar" } size: 358390 timestamp: 1525655755850 type: FILE visibility: PUBLIC
    __spark_libs__/avro-mapred-1.7.7-hadoop2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/avro-mapred-1.7.7-hadoop2.jar" } size: 180736 timestamp: 1525655752070 type: FILE visibility: PUBLIC
    __spark_libs__/xercesImpl-2.9.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/xercesImpl-2.9.1.jar" } size: 1229125 timestamp: 1525655759361 type: FILE visibility: PUBLIC
    __spark_libs__/jpam-1.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jpam-1.1.jar" } size: 12131 timestamp: 1525655755657 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-mapreduce-client-common-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-mapreduce-client-common-2.7.3.jar" } size: 776634 timestamp: 1525655753914 type: FILE visibility: PUBLIC
    __spark_libs__/chill_2.11-0.8.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/chill_2.11-0.8.4.jar" } size: 224167 timestamp: 1525655752524 type: FILE visibility: PUBLIC
    __spark_libs__/parquet-hadoop-bundle-1.6.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/parquet-hadoop-bundle-1.6.0.jar" } size: 2796935 timestamp: 1525655757497 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-mapreduce-client-jobclient-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-mapreduce-client-jobclient-2.7.3.jar" } size: 62304 timestamp: 1525655753988 type: FILE visibility: PUBLIC
    __spark_libs__/scala-reflect-2.11.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/scala-reflect-2.11.8.jar" } size: 4573750 timestamp: 1525655757963 type: FILE visibility: PUBLIC
    __spark_libs__/curator-framework-2.7.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/curator-framework-2.7.1.jar" } size: 186273 timestamp: 1525655753210 type: FILE visibility: PUBLIC
    __spark_libs__/commons-beanutils-core-1.8.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-beanutils-core-1.8.0.jar" } size: 206035 timestamp: 1525655752610 type: FILE visibility: PUBLIC
    __spark_libs__/commons-logging-1.1.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-logging-1.1.3.jar" } size: 62050 timestamp: 1525655752986 type: FILE visibility: PUBLIC
    __spark_libs__/json4s-core_2.11-3.2.11.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/json4s-core_2.11-3.2.11.jar" } size: 589462 timestamp: 1525655755703 type: FILE visibility: PUBLIC
    __spark_libs__/validation-api-1.1.0.Final.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/validation-api-1.1.0.Final.jar" } size: 63777 timestamp: 1525655759313 type: FILE visibility: PUBLIC
    __spark_libs__/breeze-macros_2.11-0.13.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/breeze-macros_2.11-0.13.2.jar" } size: 187882 timestamp: 1525655752378 type: FILE visibility: PUBLIC
    __spark_libs__/minlog-1.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/minlog-1.3.0.jar" } size: 5711 timestamp: 1525655756322 type: FILE visibility: PUBLIC
    __spark_libs__/metrics-json-3.1.5.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/metrics-json-3.1.5.jar" } size: 15824 timestamp: 1525655756281 type: FILE visibility: PUBLIC
    __spark_libs__/jets3t-0.9.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jets3t-0.9.4.jar" } size: 2046361 timestamp: 1525655755509 type: FILE visibility: PUBLIC
    __spark_libs__/antlr4-runtime-4.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/antlr4-runtime-4.7.jar" } size: 334662 timestamp: 1525655751514 type: FILE visibility: PUBLIC
    __spark_libs__/commons-cli-1.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-cli-1.2.jar" } size: 41123 timestamp: 1525655752636 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-yarn-common-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-yarn-common-2.7.3.jar" } size: 1678642 timestamp: 1525655754117 type: FILE visibility: PUBLIC
    __spark_libs__/bonecp-0.8.0.RELEASE.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/bonecp-0.8.0.RELEASE.jar" } size: 110600 timestamp: 1525655752190 type: FILE visibility: PUBLIC
    __spark_libs__/jline-2.12.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jline-2.12.1.jar" } size: 213911 timestamp: 1525655755587 type: FILE visibility: PUBLIC
    __spark_libs__/scala-compiler-2.11.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/scala-compiler-2.11.8.jar" } size: 15487351 timestamp: 1525655757775 type: FILE visibility: PUBLIC
    __spark_libs__/javax.servlet-api-3.1.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/javax.servlet-api-3.1.0.jar" } size: 95806 timestamp: 1525655755165 type: FILE visibility: PUBLIC
    __spark_libs__/parquet-common-1.8.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/parquet-common-1.8.2.jar" } size: 42032 timestamp: 1525655757380 type: FILE visibility: PUBLIC
    __spark_libs__/avro-ipc-1.7.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/avro-ipc-1.7.7.jar" } size: 192993 timestamp: 1525655752041 type: FILE visibility: PUBLIC
    __spark_libs__/hk2-utils-2.4.0-b34.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hk2-utils-2.4.0-b34.jar" } size: 118973 timestamp: 1525655754526 type: FILE visibility: PUBLIC
    __spark_libs__/chill-java-0.8.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/chill-java-0.8.4.jar" } size: 60483 timestamp: 1525655752553 type: FILE visibility: PUBLIC
    __spark_libs__/eigenbase-properties-1.1.5.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/eigenbase-properties-1.1.5.jar" } size: 18482 timestamp: 1525655753435 type: FILE visibility: PUBLIC
    __spark_libs__/jaxb-api-2.2.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jaxb-api-2.2.2.jar" } size: 105134 timestamp: 1525655755240 type: FILE visibility: PUBLIC
    __spark_libs__/scala-parser-combinators_2.11-1.0.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/scala-parser-combinators_2.11-1.0.4.jar" } size: 423753 timestamp: 1525655757903 type: FILE visibility: PUBLIC
    __spark_libs__/RoaringBitmap-0.5.11.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/RoaringBitmap-0.5.11.jar" } size: 201928 timestamp: 1525655757620 type: FILE visibility: PUBLIC
    __spark_libs__/spark-sql_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-sql_2.11-2.3.0.jar" } size: 8665004 timestamp: 1525655758898 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-jaxrs-1.9.13.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-jaxrs-1.9.13.jar" } size: 18336 timestamp: 1525655754845 type: FILE visibility: PUBLIC
    __spark_libs__/jersey-container-servlet-2.22.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jersey-container-servlet-2.22.2.jar" } size: 18098 timestamp: 1525655755366 type: FILE visibility: PUBLIC
    __spark_libs__/py4j-0.10.6.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/py4j-0.10.6.jar" } size: 116859 timestamp: 1525655757577 type: FILE visibility: PUBLIC
    __spark_libs__/core-1.1.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/core-1.1.2.jar" } size: 164422 timestamp: 1525655753149 type: FILE visibility: PUBLIC
    __spark_libs__/libthrift-0.9.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/libthrift-0.9.3.jar" } size: 234201 timestamp: 1525655756033 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-dataformat-yaml-2.6.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-dataformat-yaml-2.6.7.jar" } size: 320444 timestamp: 1525655754823 type: FILE visibility: PUBLIC
    __spark_libs__/spark-hive-thriftserver_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-hive-thriftserver_2.11-2.3.0.jar" } size: 1811675 timestamp: 1525655758489 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-hdfs-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-hdfs-2.7.3.jar" } size: 8316190 timestamp: 1525655753853 type: FILE visibility: PUBLIC
    __spark_libs__/htrace-core-3.1.0-incubating.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/htrace-core-3.1.0-incubating.jar" } size: 1475955 timestamp: 1525655754597 type: FILE visibility: PUBLIC
    __spark_libs__/orc-core-1.4.1-nohive.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/orc-core-1.4.1-nohive.jar" } size: 1441994 timestamp: 1525655757229 type: FILE visibility: PUBLIC
    __spark_libs__/guice-3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/guice-3.0.jar" } size: 710492 timestamp: 1525655753595 type: FILE visibility: PUBLIC
    __spark_libs__/commons-pool-1.5.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-pool-1.5.4.jar" } size: 96221 timestamp: 1525655753087 type: FILE visibility: PUBLIC
    __spark_libs__/aircompressor-0.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/aircompressor-0.8.jar" } size: 130802 timestamp: 1525655751428 type: FILE visibility: PUBLIC
    __spark_libs__/guice-servlet-3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/guice-servlet-3.0.jar" } size: 65012 timestamp: 1525655753621 type: FILE visibility: PUBLIC
    __spark_libs__/hppc-0.7.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hppc-0.7.2.jar" } size: 1671083 timestamp: 1525655754564 type: FILE visibility: PUBLIC
    __spark_libs__/spark-mllib_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-mllib_2.11-2.3.0.jar" } size: 7738543 timestamp: 1525655758677 type: FILE visibility: PUBLIC
    __spark_conf__ -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0016/__spark_conf__.zip" } size: 224134 timestamp: 1525656133231 type: ARCHIVE visibility: PRIVATE
    __spark_libs__/spark-graphx_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-graphx_2.11-2.3.0.jar" } size: 708678 timestamp: 1525655758420 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-xc-1.9.13.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-xc-1.9.13.jar" } size: 27084 timestamp: 1525655754972 type: FILE visibility: PUBLIC
    __spark_libs__/JavaEWAH-0.3.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/JavaEWAH-0.3.2.jar" } size: 16993 timestamp: 1525655755023 type: FILE visibility: PUBLIC
    __spark_libs__/activation-1.1.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/activation-1.1.1.jar" } size: 69409 timestamp: 1525655751380 type: FILE visibility: PUBLIC
    __spark_libs__/objenesis-2.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/objenesis-2.1.jar" } size: 41755 timestamp: 1525655757118 type: FILE visibility: PUBLIC
    __spark_libs__/netty-all-4.1.17.Final.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/netty-all-4.1.17.Final.jar" } size: 3780056 timestamp: 1525655757097 type: FILE visibility: PUBLIC
    __spark_libs__/commons-crypto-1.0.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-crypto-1.0.0.jar" } size: 134595 timestamp: 1525655752799 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-annotations-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-annotations-2.7.3.jar" } size: 40863 timestamp: 1525655753645 type: FILE visibility: PUBLIC
    __spark_libs__/kubernetes-model-2.0.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/kubernetes-model-2.0.0.jar" } size: 7015233 timestamp: 1525655755953 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-yarn-server-common-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-yarn-server-common-2.7.3.jar" } size: 388235 timestamp: 1525655754145 type: FILE visibility: PUBLIC
    __spark_libs__/joda-time-2.9.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/joda-time-2.9.3.jar" } size: 627814 timestamp: 1525655755614 type: FILE visibility: PUBLIC
    __spark_libs__/jodd-core-3.5.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jodd-core-3.5.2.jar" } size: 427780 timestamp: 1525655755639 type: FILE visibility: PUBLIC
    __spark_libs__/api-util-1.0.0-M20.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/api-util-1.0.0-M20.jar" } size: 79912 timestamp: 1525655751786 type: FILE visibility: PUBLIC
    __spark_libs__/spark-streaming_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-streaming_2.11-2.3.0.jar" } size: 2169838 timestamp: 1525655758940 type: FILE visibility: PUBLIC
    __spark_libs__/aopalliance-repackaged-2.4.0-b34.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/aopalliance-repackaged-2.4.0-b34.jar" } size: 14766 timestamp: 1525655751617 type: FILE visibility: PUBLIC
    __spark_libs__/json4s-jackson_2.11-3.2.11.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/json4s-jackson_2.11-3.2.11.jar" } size: 40341 timestamp: 1525655755724 type: FILE visibility: PUBLIC
    __spark_libs__/apacheds-kerberos-codec-2.0.0-M15.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/apacheds-kerberos-codec-2.0.0-M15.jar" } size: 691479 timestamp: 1525655751688 type: FILE visibility: PUBLIC
    __spark_libs__/parquet-format-2.3.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/parquet-format-2.3.1.jar" } size: 390733 timestamp: 1525655757427 type: FILE visibility: PUBLIC
    __spark_libs__/jdo-api-3.0.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jdo-api-3.0.1.jar" } size: 201124 timestamp: 1525655755282 type: FILE visibility: PUBLIC
    __spark_libs__/jersey-media-jaxb-2.22.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jersey-media-jaxb-2.22.2.jar" } size: 72733 timestamp: 1525655755439 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-yarn-server-web-proxy-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-yarn-server-web-proxy-2.7.3.jar" } size: 58407 timestamp: 1525655754169 type: FILE visibility: PUBLIC
    __spark_libs__/spire_2.11-0.13.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spire_2.11-0.13.0.jar" } size: 10121868 timestamp: 1525655759128 type: FILE visibility: PUBLIC
    __spark_libs__/snappy-java-1.1.2.6.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/snappy-java-1.1.2.6.jar" } size: 1056168 timestamp: 1525655758153 type: FILE visibility: PUBLIC
    __spark_libs__/machinist_2.11-0.6.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/machinist_2.11-0.6.1.jar" } size: 34603 timestamp: 1525655756119 type: FILE visibility: PUBLIC
    __spark_libs__/datanucleus-core-3.2.10.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/datanucleus-core-3.2.10.jar" } size: 1890075 timestamp: 1525655753315 type: FILE visibility: PUBLIC
    __spark_libs__/httpcore-4.4.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/httpcore-4.4.8.jar" } size: 324565 timestamp: 1525655754654 type: FILE visibility: PUBLIC
    __spark_libs__/scalap-2.11.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/scalap-2.11.8.jar" } size: 802818 timestamp: 1525655757880 type: FILE visibility: PUBLIC
    __spark_libs__/zstd-jni-1.3.2-2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/zstd-jni-1.3.2-2.jar" } size: 2333186 timestamp: 1525655759491 type: FILE visibility: PUBLIC
    __spark_libs__/spark-kubernetes_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-kubernetes_2.11-2.3.0.jar" } size: 381117 timestamp: 1525655758515 type: FILE visibility: PUBLIC
    __spark_libs__/bcprov-jdk15on-1.58.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/bcprov-jdk15on-1.58.jar" } size: 3955990 timestamp: 1525655752161 type: FILE visibility: PUBLIC
    __spark_libs__/calcite-avatica-1.2.0-incubating.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/calcite-avatica-1.2.0-incubating.jar" } size: 258370 timestamp: 1525655752408 type: FILE visibility: PUBLIC
    __spark_libs__/javax.inject-2.4.0-b34.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/javax.inject-2.4.0-b34.jar" } size: 5950 timestamp: 1525655755119 type: FILE visibility: PUBLIC
    __spark_libs__/spire-macros_2.11-0.13.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spire-macros_2.11-0.13.0.jar" } size: 87192 timestamp: 1525655759153 type: FILE visibility: PUBLIC
    __spark_libs__/datanucleus-api-jdo-3.2.6.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/datanucleus-api-jdo-3.2.6.jar" } size: 339666 timestamp: 1525655753272 type: FILE visibility: PUBLIC
    __spark_libs__/commons-compiler-3.0.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-compiler-3.0.8.jar" } size: 37887 timestamp: 1525655752717 type: FILE visibility: PUBLIC
    __spark_libs__/spark-repl_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-repl_2.11-2.3.0.jar" } size: 120336 timestamp: 1525655758785 type: FILE visibility: PUBLIC
    __spark_libs__/commons-digester-1.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-digester-1.8.jar" } size: 143602 timestamp: 1525655752847 type: FILE visibility: PUBLIC
    __spark_libs__/spark-sketch_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-sketch_2.11-2.3.0.jar" } size: 30057 timestamp: 1525655758805 type: FILE visibility: PUBLIC
    tfspark.zip -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0016/tfspark.zip" } size: 33111 timestamp: 1525656133005 type: FILE visibility: PRIVATE
    __spark_libs__/log4j-1.2.17.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/log4j-1.2.17.jar" } size: 489884 timestamp: 1525655756057 type: FILE visibility: PUBLIC
    __spark_libs__/curator-client-2.7.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/curator-client-2.7.1.jar" } size: 69500 timestamp: 1525655753181 type: FILE visibility: PUBLIC
    __spark_libs__/arpack_combined_all-0.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/arpack_combined_all-0.1.jar" } size: 1194003 timestamp: 1525655751826 type: FILE visibility: PUBLIC
    __spark_libs__/commons-collections-3.2.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-collections-3.2.2.jar" } size: 588337 timestamp: 1525655752691 type: FILE visibility: PUBLIC
    __spark_libs__/netty-3.9.9.Final.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/netty-3.9.9.Final.jar" } size: 1330219 timestamp: 1525655756359 type: FILE visibility: PUBLIC
    __spark_libs__/commons-configuration-1.6.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-configuration-1.6.jar" } size: 298829 timestamp: 1525655752773 type: FILE visibility: PUBLIC
    __spark_libs__/commons-codec-1.10.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-codec-1.10.jar" } size: 284184 timestamp: 1525655752661 type: FILE visibility: PUBLIC
    __spark_libs__/datanucleus-rdbms-3.2.9.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/datanucleus-rdbms-3.2.9.jar" } size: 1809447 timestamp: 1525655753355 type: FILE visibility: PUBLIC
    __spark_libs__/protobuf-java-2.5.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/protobuf-java-2.5.0.jar" } size: 533455 timestamp: 1525655757554 type: FILE visibility: PUBLIC
    __spark_libs__/hive-beeline-1.2.1.spark2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hive-beeline-1.2.1.spark2.jar" } size: 138464 timestamp: 1525655754193 type: FILE visibility: PUBLIC
    __spark_libs__/metrics-jvm-3.1.5.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/metrics-jvm-3.1.5.jar" } size: 39283 timestamp: 1525655756302 type: FILE visibility: PUBLIC
    __spark_libs__/slf4j-log4j12-1.7.16.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/slf4j-log4j12-1.7.16.jar" } size: 9939 timestamp: 1525655758080 type: FILE visibility: PUBLIC
    __spark_libs__/jersey-guava-2.22.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jersey-guava-2.22.2.jar" } size: 971310 timestamp: 1525655755415 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-client-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-client-2.7.3.jar" } size: 26012 timestamp: 1525655753696 type: FILE visibility: PUBLIC
    __spark_libs__/api-asn1-api-1.0.0-M20.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/api-asn1-api-1.0.0-M20.jar" } size: 16560 timestamp: 1525655751754 type: FILE visibility: PUBLIC
    __spark_libs__/javassist-3.18.1-GA.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/javassist-3.18.1-GA.jar" } size: 714194 timestamp: 1525655755051 type: FILE visibility: PUBLIC
    __spark_libs__/lz4-java-1.4.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/lz4-java-1.4.0.jar" } size: 370119 timestamp: 1525655756100 type: FILE visibility: PUBLIC
    __spark_libs__/calcite-core-1.2.0-incubating.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/calcite-core-1.2.0-incubating.jar" } size: 3519262 timestamp: 1525655752466 type: FILE visibility: PUBLIC
    pyspark.zip -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0016/pyspark.zip" } size: 538841 timestamp: 1525656132952 type: FILE visibility: PRIVATE
    __spark_libs__/jcl-over-slf4j-1.7.16.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jcl-over-slf4j-1.7.16.jar" } size: 16430 timestamp: 1525655755261 type: FILE visibility: PUBLIC
    __spark_libs__/arrow-vector-0.8.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/arrow-vector-0.8.0.jar" } size: 1270377 timestamp: 1525655751947 type: FILE visibility: PUBLIC
    __spark_libs__/jersey-container-servlet-core-2.22.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jersey-container-servlet-core-2.22.2.jar" } size: 66270 timestamp: 1525655755387 type: FILE visibility: PUBLIC
    __spark_libs__/commons-dbcp-1.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-dbcp-1.4.jar" } size: 160519 timestamp: 1525655752821 type: FILE visibility: PUBLIC
    __spark_libs__/generex-1.0.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/generex-1.0.1.jar" } size: 13911 timestamp: 1525655753488 type: FILE visibility: PUBLIC
    __spark_libs__/snappy-0.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/snappy-0.2.jar" } size: 48720 timestamp: 1525655758123 type: FILE visibility: PUBLIC
    __spark_libs__/apacheds-i18n-2.0.0-M15.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/apacheds-i18n-2.0.0-M15.jar" } size: 44925 timestamp: 1525655751651 type: FILE visibility: PUBLIC
    __spark_libs__/metrics-graphite-3.1.5.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/metrics-graphite-3.1.5.jar" } size: 21247 timestamp: 1525655756263 type: FILE visibility: PUBLIC
    __spark_libs__/zjsonpatch-0.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/zjsonpatch-0.3.0.jar" } size: 35518 timestamp: 1525655759419 type: FILE visibility: PUBLIC
    __spark_libs__/jsr305-1.3.9.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jsr305-1.3.9.jar" } size: 33015 timestamp: 1525655755763 type: FILE visibility: PUBLIC
    __spark_libs__/janino-3.0.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/janino-3.0.8.jar" } size: 796326 timestamp: 1525655755001 type: FILE visibility: PUBLIC
    __spark_libs__/jetty-6.1.26.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jetty-6.1.26.jar" } size: 539912 timestamp: 1525655755535 type: FILE visibility: PUBLIC
    __spark_libs__/snakeyaml-1.15.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/snakeyaml-1.15.jar" } size: 269295 timestamp: 1525655758101 type: FILE visibility: PUBLIC
    __spark_libs__/jersey-common-2.22.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jersey-common-2.22.2.jar" } size: 698375 timestamp: 1525655755338 type: FILE visibility: PUBLIC
    __spark_libs__/spark-launcher_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-launcher_2.11-2.3.0.jar" } size: 75835 timestamp: 1525655758561 type: FILE visibility: PUBLIC
    __spark_libs__/zookeeper-3.4.6.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/zookeeper-3.4.6.jar" } size: 792964 timestamp: 1525655759444 type: FILE visibility: PUBLIC
    __spark_libs__/opencsv-2.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/opencsv-2.3.jar" } size: 19827 timestamp: 1525655757191 type: FILE visibility: PUBLIC
    __spark_libs__/flatbuffers-1.2.0-3f79e055.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/flatbuffers-1.2.0-3f79e055.jar" } size: 10166 timestamp: 1525655753461 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-mapreduce-client-shuffle-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-mapreduce-client-shuffle-2.7.3.jar" } size: 71737 timestamp: 1525655754011 type: FILE visibility: PUBLIC
    __spark_libs__/jsp-api-2.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jsp-api-2.1.jar" } size: 100636 timestamp: 1525655755744 type: FILE visibility: PUBLIC
    __spark_libs__/xz-1.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/xz-1.0.jar" } size: 94672 timestamp: 1525655759399 type: FILE visibility: PUBLIC
    __spark_libs__/avro-1.7.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/avro-1.7.7.jar" } size: 436303 timestamp: 1525655752010 type: FILE visibility: PUBLIC
    __pyfiles__/mnist_dist.py -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/user/hpe/.sparkStaging/application_1525333084924_0016/mnist_dist.py" } size: 5883 timestamp: 1525656133027 type: FILE visibility: PRIVATE
    __spark_libs__/spark-yarn_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-yarn_2.11-2.3.0.jar" } size: 650250 timestamp: 1525655759017 type: FILE visibility: PUBLIC
    __spark_libs__/paranamer-2.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/paranamer-2.8.jar" } size: 34654 timestamp: 1525655757330 type: FILE visibility: PUBLIC
    __spark_libs__/spark-core_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-core_2.11-2.3.0.jar" } size: 12982728 timestamp: 1525655758390 type: FILE visibility: PUBLIC
    __spark_libs__/ivy-2.4.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/ivy-2.4.0.jar" } size: 1282424 timestamp: 1525655754691 type: FILE visibility: PUBLIC
    __spark_libs__/osgi-resource-locator-1.0.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/osgi-resource-locator-1.0.1.jar" } size: 20235 timestamp: 1525655757307 type: FILE visibility: PUBLIC
    __spark_libs__/mesos-1.4.0-shaded-protobuf.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/mesos-1.4.0-shaded-protobuf.jar" } size: 7343426 timestamp: 1525655756221 type: FILE visibility: PUBLIC
    __spark_libs__/antlr-runtime-3.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/antlr-runtime-3.4.jar" } size: 164368 timestamp: 1525655751553 type: FILE visibility: PUBLIC
    __spark_libs__/hive-exec-1.2.1.spark2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hive-exec-1.2.1.spark2.jar" } size: 11498852 timestamp: 1525655754341 type: FILE visibility: PUBLIC
    __spark_libs__/ST4-4.0.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/ST4-4.0.4.jar" } size: 236660 timestamp: 1525655759173 type: FILE visibility: PUBLIC
    __spark_libs__/calcite-linq4j-1.2.0-incubating.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/calcite-linq4j-1.2.0-incubating.jar" } size: 442406 timestamp: 1525655752497 type: FILE visibility: PUBLIC
    __spark_libs__/spark-hive_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-hive_2.11-2.3.0.jar" } size: 1304415 timestamp: 1525655758452 type: FILE visibility: PUBLIC
    __spark_libs__/spark-kvstore_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-kvstore_2.11-2.3.0.jar" } size: 53007 timestamp: 1525655758538 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-core-2.6.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-core-2.6.7.jar" } size: 258919 timestamp: 1525655754740 type: FILE visibility: PUBLIC
    __spark_libs__/jersey-server-2.22.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jersey-server-2.22.2.jar" } size: 951701 timestamp: 1525655755469 type: FILE visibility: PUBLIC
    __spark_libs__/parquet-hadoop-1.8.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/parquet-hadoop-1.8.2.jar" } size: 250377 timestamp: 1525655757450 type: FILE visibility: PUBLIC
    __spark_libs__/stax-api-1.0-2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/stax-api-1.0-2.jar" } size: 23346 timestamp: 1525655759209 type: FILE visibility: PUBLIC
    __spark_libs__/super-csv-2.2.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/super-csv-2.2.0.jar" } size: 93210 timestamp: 1525655759270 type: FILE visibility: PUBLIC
    __spark_libs__/automaton-1.11-8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/automaton-1.11-8.jar" } size: 176285 timestamp: 1525655751977 type: FILE visibility: PUBLIC
    __spark_libs__/libfb303-0.9.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/libfb303-0.9.3.jar" } size: 313702 timestamp: 1525655756008 type: FILE visibility: PUBLIC
    __spark_libs__/slf4j-api-1.7.16.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/slf4j-api-1.7.16.jar" } size: 40509 timestamp: 1525655758060 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-yarn-api-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-yarn-api-2.7.3.jar" } size: 2039143 timestamp: 1525655754053 type: FILE visibility: PUBLIC
    __spark_libs__/javax.inject-1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/javax.inject-1.jar" } size: 2497 timestamp: 1525655755097 type: FILE visibility: PUBLIC
    __spark_libs__/kubernetes-client-3.0.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/kubernetes-client-3.0.0.jar" } size: 416731 timestamp: 1525655755872 type: FILE visibility: PUBLIC
    __spark_libs__/spark-mesos_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-mesos_2.11-2.3.0.jar" } size: 663338 timestamp: 1525655758587 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-mapreduce-client-app-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-mapreduce-client-app-2.7.3.jar" } size: 542869 timestamp: 1525655753883 type: FILE visibility: PUBLIC
    __spark_libs__/xmlenc-0.52.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/xmlenc-0.52.jar" } size: 15010 timestamp: 1525655759378 type: FILE visibility: PUBLIC
    __spark_libs__/leveldbjni-all-1.8.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/leveldbjni-all-1.8.jar" } size: 1045744 timestamp: 1525655755985 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-mapreduce-client-core-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-mapreduce-client-core-2.7.3.jar" } size: 1556539 timestamp: 1525655753958 type: FILE visibility: PUBLIC
    __spark_libs__/stax-api-1.0.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/stax-api-1.0.1.jar" } size: 26514 timestamp: 1525655759189 type: FILE visibility: PUBLIC
    __spark_libs__/univocity-parsers-2.5.9.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/univocity-parsers-2.5.9.jar" } size: 384150 timestamp: 1525655759292 type: FILE visibility: PUBLIC
    __spark_libs__/arrow-format-0.8.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/arrow-format-0.8.0.jar" } size: 52076 timestamp: 1525655751863 type: FILE visibility: PUBLIC
    __spark_libs__/commons-httpclient-3.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-httpclient-3.1.jar" } size: 305001 timestamp: 1525655752875 type: FILE visibility: PUBLIC
    __spark_libs__/pyrolite-4.13.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/pyrolite-4.13.jar" } size: 94796 timestamp: 1525655757598 type: FILE visibility: PUBLIC
    __spark_libs__/jul-to-slf4j-1.7.16.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jul-to-slf4j-1.7.16.jar" } size: 4596 timestamp: 1525655755830 type: FILE visibility: PUBLIC
    __spark_libs__/hive-jdbc-1.2.1.spark2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hive-jdbc-1.2.1.spark2.jar" } size: 100680 timestamp: 1525655754369 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-auth-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-auth-2.7.3.jar" } size: 94150 timestamp: 1525655753670 type: FILE visibility: PUBLIC
    __spark_libs__/metrics-core-3.1.5.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/metrics-core-3.1.5.jar" } size: 120465 timestamp: 1525655756244 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-yarn-client-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-yarn-client-2.7.3.jar" } size: 165867 timestamp: 1525655754079 type: FILE visibility: PUBLIC
    __spark_libs__/okio-1.13.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/okio-1.13.0.jar" } size: 81811 timestamp: 1525655757169 type: FILE visibility: PUBLIC
    __spark_libs__/parquet-column-1.8.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/parquet-column-1.8.2.jar" } size: 956984 timestamp: 1525655757360 type: FILE visibility: PUBLIC
    __spark_libs__/stringtemplate-3.2.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/stringtemplate-3.2.1.jar" } size: 148627 timestamp: 1525655759247 type: FILE visibility: PUBLIC
    __spark_libs__/commons-io-2.4.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-io-2.4.jar" } size: 185140 timestamp: 1525655752904 type: FILE visibility: PUBLIC
    __spark_libs__/commons-lang-2.6.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/commons-lang-2.6.jar" } size: 284220 timestamp: 1525655752933 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-annotations-2.6.7.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-annotations-2.6.7.jar" } size: 46986 timestamp: 1525655754715 type: FILE visibility: PUBLIC
    __spark_libs__/spark-catalyst_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-catalyst_2.11-2.3.0.jar" } size: 8984438 timestamp: 1525655758253 type: FILE visibility: PUBLIC
    __spark_libs__/orc-mapreduce-1.4.1-nohive.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/orc-mapreduce-1.4.1-nohive.jar" } size: 757423 timestamp: 1525655757259 type: FILE visibility: PUBLIC
    __spark_libs__/aopalliance-1.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/aopalliance-1.0.jar" } size: 4467 timestamp: 1525655751584 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-module-scala_2.11-2.6.7.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-module-scala_2.11-2.6.7.1.jar" } size: 515645 timestamp: 1525655754949 type: FILE visibility: PUBLIC
    __spark_libs__/jackson-databind-2.6.7.1.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jackson-databind-2.6.7.1.jar" } size: 1165323 timestamp: 1525655754798 type: FILE visibility: PUBLIC
    __spark_libs__/spark-network-shuffle_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-network-shuffle_2.11-2.3.0.jar" } size: 64847 timestamp: 1525655758763 type: FILE visibility: PUBLIC
    __spark_libs__/spark-network-common_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-network-common_2.11-2.3.0.jar" } size: 2382165 timestamp: 1525655758743 type: FILE visibility: PUBLIC
    __spark_libs__/arrow-memory-0.8.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/arrow-memory-0.8.0.jar" } size: 79150 timestamp: 1525655751897 type: FILE visibility: PUBLIC
    __spark_libs__/jtransforms-2.4.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/jtransforms-2.4.0.jar" } size: 764569 timestamp: 1525655755811 type: FILE visibility: PUBLIC
    __spark_libs__/hadoop-common-2.7.3.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/hadoop-common-2.7.3.jar" } size: 3479293 timestamp: 1525655753753 type: FILE visibility: PUBLIC
    __spark_libs__/shapeless_2.11-2.3.2.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/shapeless_2.11-2.3.2.jar" } size: 3522616 timestamp: 1525655758038 type: FILE visibility: PUBLIC
    __spark_libs__/spark-mllib-local_2.11-2.3.0.jar -> resource { scheme: "hdfs" host: "hpe01" port: 9000 file: "/spark/jars/spark-mllib-local_2.11-2.3.0.jar" } size: 183832 timestamp: 1525655758701 type: FILE visibility: PUBLIC

===============================================================================
2018-05-07 09:22:22 INFO  RMProxy:98 - Connecting to ResourceManager at hpe01/192.168.136.158:8030
2018-05-07 09:22:22 INFO  YarnRMClient:54 - Registering the ApplicationMaster
2018-05-07 09:22:23 INFO  YarnAllocator:54 - Will request 3 executor container(s), each with 8 core(s) and 22528 MB memory (including 2048 MB of overhead)
2018-05-07 09:22:23 INFO  YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@hpe01:41862)
2018-05-07 09:22:23 INFO  YarnAllocator:54 - Submitted 3 unlocalized container requests.
2018-05-07 09:22:23 INFO  ApplicationMaster:54 - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
2018-05-07 09:22:23 INFO  AMRMClientImpl:360 - Received new token for : hpe02:36742
2018-05-07 09:22:23 INFO  YarnAllocator:54 - Launching container container_1525333084924_0016_01_000002 on host hpe02 for executor with ID 1
2018-05-07 09:22:23 INFO  YarnAllocator:54 - Received 1 containers from YARN, launching executors on 1 of them.
2018-05-07 09:22:23 INFO  ContainerManagementProtocolProxy:81 - yarn.client.max-cached-nodemanagers-proxies : 0
2018-05-07 09:22:23 INFO  ContainerManagementProtocolProxy:260 - Opening proxy : hpe02:36742
2018-05-07 09:22:24 INFO  AMRMClientImpl:360 - Received new token for : hpe01:33415
2018-05-07 09:22:24 INFO  AMRMClientImpl:360 - Received new token for : hpe03:35000
2018-05-07 09:22:24 INFO  YarnAllocator:54 - Launching container container_1525333084924_0016_01_000003 on host hpe01 for executor with ID 2
2018-05-07 09:22:24 INFO  YarnAllocator:54 - Launching container container_1525333084924_0016_01_000004 on host hpe03 for executor with ID 3
2018-05-07 09:22:24 INFO  YarnAllocator:54 - Received 2 containers from YARN, launching executors on 2 of them.
2018-05-07 09:22:24 INFO  ContainerManagementProtocolProxy:81 - yarn.client.max-cached-nodemanagers-proxies : 0
2018-05-07 09:22:24 INFO  ContainerManagementProtocolProxy:81 - yarn.client.max-cached-nodemanagers-proxies : 0
2018-05-07 09:22:24 INFO  ContainerManagementProtocolProxy:260 - Opening proxy : hpe01:33415
2018-05-07 09:22:24 INFO  ContainerManagementProtocolProxy:260 - Opening proxy : hpe03:35000
2018-05-07 09:22:28 INFO  YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.136.159:34684) with ID 1
2018-05-07 09:22:28 INFO  YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.136.158:59736) with ID 2
2018-05-07 09:22:28 INFO  BlockManagerMasterEndpoint:54 - Registering block manager hpe02:40311 with 10.5 GB RAM, BlockManagerId(1, hpe02, 40311, None)
2018-05-07 09:22:28 INFO  BlockManagerMasterEndpoint:54 - Registering block manager hpe01:33489 with 10.5 GB RAM, BlockManagerId(2, hpe01, 33489, None)
2018-05-07 09:22:29 INFO  YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.136.160:38020) with ID 3
2018-05-07 09:22:29 INFO  YarnClusterSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
2018-05-07 09:22:29 INFO  YarnClusterScheduler:54 - YarnClusterScheduler.postStartHook done
2018-05-07 09:22:29 INFO  BlockManagerMasterEndpoint:54 - Registering block manager hpe03:38332 with 10.5 GB RAM, BlockManagerId(3, hpe03, 38332, None)
args: Namespace(batch_size=100, cluster_size=3, epochs=1, format='csv', images='hdfs:///data/mnist/csv/train/images', labels='hdfs:///data/mnist/csv/train/labels', mode='train', model='mnist_model', output='predictions', rdma=False, readers=1, steps=1000, tensorboard=False)
2018-05-07T09:22:29.415272 ===== Start
2018-05-07 09:22:29 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 241.0 KB, free 366.1 MB)
2018-05-07 09:22:29 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.2 KB, free 366.0 MB)
2018-05-07 09:22:29 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe01:38930 (size: 23.2 KB, free: 366.3 MB)
2018-05-07 09:22:29 INFO  SparkContext:54 - Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0
2018-05-07 09:22:30 INFO  MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 241.1 KB, free 365.8 MB)
2018-05-07 09:22:30 INFO  MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.2 KB, free 365.8 MB)
2018-05-07 09:22:30 INFO  BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe01:38930 (size: 23.2 KB, free: 366.3 MB)
2018-05-07 09:22:30 INFO  SparkContext:54 - Created broadcast 1 from textFile at NativeMethodAccessorImpl.java:0
zipping images and labels
2018-05-07 09:22:30 INFO  FileInputFormat:249 - Total input paths to process : 10
2018-05-07 09:22:30 INFO  FileInputFormat:249 - Total input paths to process : 10
2018-05-07 09:22:30,240 INFO (MainThread-27191) Reserving TFSparkNodes 
2018-05-07 09:22:30,240 INFO (MainThread-27191) cluster_template: {'ps': range(0, 1), 'worker': range(1, 3)}
2018-05-07 09:22:30,242 INFO (MainThread-27191) listening for reservations at ('192.168.136.158', 42715)
2018-05-07 09:22:30,243 INFO (MainThread-27191) Starting TensorFlow on executors
2018-05-07 09:22:30,254 INFO (MainThread-27191) Waiting for TFSparkNodes to start
2018-05-07 09:22:30,254 INFO (MainThread-27191) waiting for 3 reservations
2018-05-07 09:22:30 INFO  SparkContext:54 - Starting job: foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293
2018-05-07 09:22:30 INFO  DAGScheduler:54 - Got job 0 (foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293) with 3 output partitions
2018-05-07 09:22:30 INFO  DAGScheduler:54 - Final stage: ResultStage 0 (foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293)
2018-05-07 09:22:30 INFO  DAGScheduler:54 - Parents of final stage: List()
2018-05-07 09:22:30 INFO  DAGScheduler:54 - Missing parents: List()
2018-05-07 09:22:30 INFO  DAGScheduler:54 - Submitting ResultStage 0 (PythonRDD[8] at foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293), which has no missing parents
2018-05-07 09:22:30 INFO  MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 14.3 KB, free 365.8 MB)
2018-05-07 09:22:30 INFO  MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 10.1 KB, free 365.8 MB)
2018-05-07 09:22:30 INFO  BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe01:38930 (size: 10.1 KB, free: 366.2 MB)
2018-05-07 09:22:30 INFO  SparkContext:54 - Created broadcast 2 from broadcast at DAGScheduler.scala:1039
2018-05-07 09:22:30 INFO  DAGScheduler:54 - Submitting 3 missing tasks from ResultStage 0 (PythonRDD[8] at foreachPartition at /tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFCluster.py:293) (first 15 tasks are for partitions Vector(0, 1, 2))
2018-05-07 09:22:30 INFO  YarnClusterScheduler:54 - Adding task set 0.0 with 3 tasks
2018-05-07 09:22:30 INFO  TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, hpe01, executor 2, partition 0, PROCESS_LOCAL, 7828 bytes)
2018-05-07 09:22:30 INFO  TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, hpe02, executor 1, partition 1, PROCESS_LOCAL, 7828 bytes)
2018-05-07 09:22:30 INFO  TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, hpe03, executor 3, partition 2, PROCESS_LOCAL, 7828 bytes)
2018-05-07 09:22:30 INFO  BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe02:40311 (size: 10.1 KB, free: 10.5 GB)
2018-05-07 09:22:30 INFO  BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe03:38332 (size: 10.1 KB, free: 10.5 GB)
2018-05-07 09:22:30 INFO  BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on hpe01:33489 (size: 10.1 KB, free: 10.5 GB)
2018-05-07 09:22:31,256 INFO (MainThread-27191) waiting for 3 reservations
2018-05-07 09:22:32,400 INFO (MainThread-27191) waiting for 3 reservations
2018-05-07 09:22:33,404 INFO (MainThread-27191) waiting for 3 reservations
2018-05-07 09:22:34,677 INFO (MainThread-27191) waiting for 3 reservations
2018-05-07 09:22:36,075 INFO (MainThread-27191) all reservations completed
2018-05-07 09:22:36,076 INFO (MainThread-27191) All TFSparkNodes started
2018-05-07 09:22:36,076 INFO (MainThread-27191) {'executor_id': 0, 'host': '192.168.136.158', 'job_name': 'ps', 'task_index': 0, 'port': 40395, 'tb_pid': 0, 'tb_port': 0, 'addr': ('192.168.136.158', 42995), 'authkey': b'z\xf6x\xc9\xfe"LM\x87_i3_\xebh\xef'}
2018-05-07 09:22:36,076 INFO (MainThread-27191) {'executor_id': 2, 'host': '192.168.136.160', 'job_name': 'worker', 'task_index': 1, 'port': 45638, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-tm696uno/listener-kmgfbw71', 'authkey': b'\x04\xc69\xdd&aG\x0e\x86\xaa\xfb\x16z\\\xb9\xc7'}
2018-05-07 09:22:36,076 INFO (MainThread-27191) {'executor_id': 1, 'host': '192.168.136.159', 'job_name': 'worker', 'task_index': 0, 'port': 34589, 'tb_pid': 0, 'tb_port': 0, 'addr': '/tmp/pymp-_nhkb93k/listener-jcb_p0ze', 'authkey': b'\x98JB\x83M\x91G\x8b\xb54Or\xa4\x85=X'}
2018-05-07 09:22:36,077 INFO (MainThread-27191) Feeding training data
2018-05-07 09:22:36 INFO  SparkContext:54 - Starting job: collect at PythonRDD.scala:153
2018-05-07 09:22:36 INFO  DAGScheduler:54 - Got job 1 (collect at PythonRDD.scala:153) with 10 output partitions
2018-05-07 09:22:36 INFO  DAGScheduler:54 - Final stage: ResultStage 1 (collect at PythonRDD.scala:153)
2018-05-07 09:22:36 INFO  DAGScheduler:54 - Parents of final stage: List()
2018-05-07 09:22:36 INFO  DAGScheduler:54 - Missing parents: List()
2018-05-07 09:22:36 INFO  DAGScheduler:54 - Submitting ResultStage 1 (PythonRDD[10] at RDD at PythonRDD.scala:48), which has no missing parents
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_3 stored as values in memory (estimated size 14.9 KB, free 365.7 MB)
2018-05-07 09:22:36 INFO  MemoryStore:54 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.0 KB, free 365.7 MB)
2018-05-07 09:22:36 INFO  BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe01:38930 (size: 8.0 KB, free: 366.2 MB)
2018-05-07 09:22:36 INFO  SparkContext:54 - Created broadcast 3 from broadcast at DAGScheduler.scala:1039
2018-05-07 09:22:36 INFO  DAGScheduler:54 - Submitting 10 missing tasks from ResultStage 1 (PythonRDD[10] at RDD at PythonRDD.scala:48) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
2018-05-07 09:22:36 INFO  YarnClusterScheduler:54 - Adding task set 1.0 with 10 tasks
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 0.0 in stage 1.0 (TID 3, hpe02, executor 1, partition 0, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 1.0 in stage 1.0 (TID 4, hpe03, executor 3, partition 1, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 3.0 in stage 1.0 (TID 5, hpe01, executor 2, partition 3, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 2.0 in stage 1.0 (TID 6, hpe02, executor 1, partition 2, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 4.0 in stage 1.0 (TID 7, hpe03, executor 3, partition 4, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 6.0 in stage 1.0 (TID 8, hpe01, executor 2, partition 6, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 5.0 in stage 1.0 (TID 9, hpe02, executor 1, partition 5, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 7.0 in stage 1.0 (TID 10, hpe03, executor 3, partition 7, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 8.0 in stage 1.0 (TID 11, hpe02, executor 1, partition 8, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Starting task 9.0 in stage 1.0 (TID 12, hpe03, executor 3, partition 9, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:36 INFO  BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe02:40311 (size: 8.0 KB, free: 10.5 GB)
2018-05-07 09:22:36 INFO  BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe01:33489 (size: 8.0 KB, free: 10.5 GB)
2018-05-07 09:22:36 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe01:33489 (size: 23.2 KB, free: 10.5 GB)
2018-05-07 09:22:36 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe02:40311 (size: 23.2 KB, free: 10.5 GB)
2018-05-07 09:22:36 INFO  BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on hpe03:38332 (size: 8.0 KB, free: 10.5 GB)
2018-05-07 09:22:36 INFO  TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 5938 ms on hpe02 (executor 1) (1/3)
2018-05-07 09:22:36 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on hpe03:38332 (size: 23.2 KB, free: 10.5 GB)
2018-05-07 09:22:37 INFO  TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 6710 ms on hpe03 (executor 3) (2/3)
2018-05-07 09:22:37 INFO  BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe01:33489 (size: 23.2 KB, free: 10.5 GB)
2018-05-07 09:22:37 INFO  BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe02:40311 (size: 23.2 KB, free: 10.5 GB)
2018-05-07 09:22:37 INFO  BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on hpe03:38332 (size: 23.2 KB, free: 10.5 GB)
2018-05-07 09:22:38 WARN  TaskSetManager:66 - Lost task 6.0 in stage 1.0 (TID 8, hpe01, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 229, in main
    process()
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 224, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 362, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 809, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFSparkNode.py", line 394, in _train
AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put'

    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
    at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
    at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
    at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

2018-05-07 09:22:38 INFO  TaskSetManager:54 - Lost task 3.0 in stage 1.0 (TID 5) on hpe01, executor 2: org.apache.spark.api.python.PythonException (Traceback (most recent call last):
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 229, in main
    process()
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000003/pyspark.zip/pyspark/worker.py", line 224, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 2438, in pipeline_func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 362, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/pyspark.zip/pyspark/rdd.py", line 809, in func
  File "/tmp/hadoop-hpe/nm-local-dir/usercache/hpe/appcache/application_1525333084924_0016/container_1525333084924_0016_01_000001/tfspark.zip/tensorflowonspark/TFSparkNode.py", line 394, in _train
AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put'
) [duplicate 1]
2018-05-07 09:22:38 INFO  TaskSetManager:54 - Starting task 3.1 in stage 1.0 (TID 13, hpe03, executor 3, partition 3, NODE_LOCAL, 8475 bytes)
2018-05-07 09:22:38 INFO  TaskSetManager:54 - Starting task 6.1 in stage 1.0 (TID 14, hpe02, executor 1, partition 6, NODE_LOCAL, 8475 bytes)
End of LogType:stdout.This log file belongs to a running container (container_1525333084924_0016_01_000001) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000001 on hpe01:33415
LogAggregationType: LOCAL
================================================================
LogType:stderr
LogLastModifiedTime:星期一 五月 07 09:22:16 +0800 2018
LogLength:493
LogContents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hpe/nm-local-dir/filecache/87/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hpe/hadoop-2.9.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
End of LogType:stderr.This log file belongs to a running container (container_1525333084924_0016_01_000001) and so may not be complete.
***********************************************************************

Container: container_1525333084924_0016_01_000001 on hpe01:33415
LogAggregationType: LOCAL
================================================================
LogType:prelaunch.out
LogLastModifiedTime:星期一 五月 07 09:22:15 +0800 2018
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_1525333084924_0016_01_000001) and so may not be complete.
******************************************************************************
markfengyunzhou commented 6 years ago

@leewyang i changed hadoop from v2.9 to v2.76 , spark from v2.3 to v2.2, and python still with before(v3.6) then occur the same problem.

leewyang commented 6 years ago

@markfengyunzhou Missed this earlier... Please use --executor-cores 1... when you use 8 cores x 3 executors, you're telling spark that you have 24 "slots" available to do work. In your case, you really just want 3 (one for each of the TF nodes).

markfengyunzhou commented 6 years ago

@leewyang i see, thanks a lot .

vamsinimmalaML commented 5 years ago

Can you tell me this, I have 10 num-executors and I assumed executor cores 4 and gave cluster_size 10, what should my arguments, I am getting the above exception.

leewyang commented 5 years ago

Please use --executor-cores 1 per this FAQ