intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.72k stars 1.26k forks source link

ERROR executor.Executor: Exception in task 2.0 in stage 3.0 (TID 6) java.lang.NullPointerException #2619

Open wzxwf opened 6 years ago

wzxwf commented 6 years ago

VGG Model on CIFAR-10

Why did the following error occur?

18/08/22 16:22:59 INFO executor.Executor: Finished task 2.0 in stage 1.0 (TID 5). 2024 bytes result sent to driver 18/08/22 16:23:02 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 6 18/08/22 16:23:02 INFO executor.Executor: Running task 2.0 in stage 3.0 (TID 6) 18/08/22 16:23:02 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 12 18/08/22 16:23:02 INFO memory.MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 3.9 KB, free 5.1 GB) 18/08/22 16:23:02 INFO broadcast.TorrentBroadcast: Reading broadcast variable 12 took 11 ms 18/08/22 16:23:02 INFO memory.MemoryStore: Block broadcast_12 stored as values in memory (estimated size 7.1 KB, free 5.1 GB) 18/08/22 16:23:03 INFO storage.BlockManager: Found block rdd_5_2 locally 18/08/22 16:23:03 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 8 18/08/22 16:23:03 INFO client.TransportClientFactory: Successfully created connection to bdpnode4.lakala.com/192.168.12.103:28809 after 2 ms (0 ms spent in bootstraps) 18/08/22 16:23:03 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 634.0 B, free 5.1 GB) 18/08/22 16:23:03 INFO broadcast.TorrentBroadcast: Reading broadcast variable 8 took 168 ms 18/08/22 16:23:03 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 2.1 KB, free 5.1 GB) 18/08/22 16:23:03 INFO utils.ThreadPool$: Set mkl threads to 1 on thread 152 18/08/22 16:23:03 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 10 18/08/22 16:23:03 INFO memory.MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 219.0 B, free 5.1 GB) 18/08/22 16:23:03 INFO broadcast.TorrentBroadcast: Reading broadcast variable 10 took 11 ms 18/08/22 16:23:03 INFO memory.MemoryStore: Block broadcast_10 stored as values in memory (estimated size 248.0 B, free 5.1 GB) 18/08/22 16:23:03 WARN storage.BlockManager: Putting block rdd_20_2 failed due to an exception 18/08/22 16:23:03 WARN storage.BlockManager: Block rdd_20_2 could not be removed as it was not found on disk or in memory 18/08/22 16:23:03 ERROR executor.Executor: Exception in task 2.0 in stage 3.0 (TID 6) java.lang.NullPointerException at com.intel.analytics.bigdl.models.utils.ModelBroadcast.value(ModelBroadcast.scala:120) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13$$anonfun$14.apply(DistriOptimizer.scala:643) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13$$anonfun$14.apply(DistriOptimizer.scala:642) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13.apply(DistriOptimizer.scala:642) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13.apply(DistriOptimizer.scala:625) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 18/08/22 16:23:03 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 10

yiheng commented 6 years ago

Looking into it.

yiheng commented 6 years ago

We cannot reproduce the error. More detail of your command/configuration/env?

wzxwf commented 6 years ago

system CentOS6.8 32 cores 256G hadoop-CDH-5.11.1 SPARK2-2.2.0.cloudera1

command: spark2-submit --master yarn --deploy-mode client \ --executor-cores 6 \ --num-executors 4 \ --driver-memory 6G \ --executor-memory 10G \ --conf spark.dynamicAllocation.enabled=false \ --driver-class-path /opt/bigdl/lib/bigdl-SPARK_2.2-0.6.0-jar-with-dependencies.jar \ --class com.intel.analytics.bigdl.models.vgg.Train \ /opt/bigdl/lib/bigdl-SPARK_2.2-0.6.0-jar-with-dependencies.jar \ -f /opt/data/cifar-10-batches-bin \ -b 24 \ --summary /opt/log \ --checkpoint /opt/model

wzxwf commented 6 years ago

Hello, can you find the reason?

Thanks

yiheng commented 6 years ago

still cannot reproduce the error...

wzxwf commented 6 years ago

2018-09-05 12:52:57 INFO Hive:233 - Registering function lcx_explode_funnels cn.leancloud.stats.hive.udf.ExplodeFunnels 2018-09-05 12:52:57 INFO Hive:233 - Registering function lcx_explode_funnels_plain cn.leancloud.stats.hive.udf.ExplodeFunnelsPlain 2018-09-05 12:52:57 INFO HiveCredentialProvider:54 - Get Token from hive metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 11 68 61 64 6f 6f 70 40 4c 41 4b 41 4c 41 2e 43 4f 4d 04 68 69 76 65 00 8a 01 65 a8 13 88 b2 8a 01 65 cc 20 0c b2 1e 0e 2018-09-05 12:52:57 INFO metastore:547 - Closed a connection to metastore, current connections: 0 2018-09-05 12:52:57 INFO Client:54 - Uploading resource file:/tmp/spark-fedd8d75-5a1c-4761-a90c-bab88509f608/spark_conf3645790103808452330.zip -> hdfs://ns2/user/hado op/.sparkStaging/application_1534319660860_6126/spark_conf.zip 2018-09-05 12:52:58 INFO SecurityManager:54 - Changing view acls to: root,hadoop 2018-09-05 12:52:58 INFO SecurityManager:54 - Changing modify acls to: root,hadoop 2018-09-05 12:52:58 INFO SecurityManager:54 - Changing view acls groups to: 2018-09-05 12:52:58 INFO SecurityManager:54 - Changing modify acls groups to: 2018-09-05 12:52:58 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, hadoop); groups with vie w permissions: Set(); users with modify permissions: Set(root, hadoop); groups with modify permissions: Set() 2018-09-05 12:52:58 INFO Client:54 - Submitting application application_1534319660860_6126 to ResourceManager 2018-09-05 12:52:58 INFO YarnClientImpl:260 - Submitted application application_1534319660860_6126 2018-09-05 12:52:58 INFO SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1534319660860_6126 and attemptId None 2018-09-05 12:52:59 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:52:59 INFO Client:54 - client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.users.hadoop start time: 1536123178344 final status: UNDEFINED tracking URL: http://bdpnode4.com:8088/proxy/application_1534319660860_6126/ user: hadoop 2018-09-05 12:53:00 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:53:01 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:53:02 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:53:03 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:53:04 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:53:05 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:53:06 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 2018-09-05 12:53:06 INFO YarnClientSchedulerBackend:54 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> bdpnode1. com,bdpnode4.com, PROXY_URI_BASES -> http://bdpnode1.com:8088/proxy/application_1534319660860_6126,http://bdpnode4.com:8088/proxy/application_15343196 60860_6126), /proxy/application_1534319660860_6126 2018-09-05 12:53:06 INFO JettyUtils:54 - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 2018-09-05 12:53:06 INFO Client:54 - Application report for application_1534319660860_6126 (state: ACCEPTED) 2018-09-05 12:53:07 INFO Client:54 - Application report for application_1534319660860_6126 (state: RUNNING) 2018-09-05 12:53:07 INFO Client:54 - client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A ApplicationMaster host: 192.168.131.101 ApplicationMaster RPC port: 0 queue: root.users.hadoop start time: 1536123178344 final status: UNDEFINED tracking URL: http://bdpnode4.com:8088/proxy/application_1534319660860_6126/ user: hadoop 2018-09-05 12:53:07 INFO YarnClientSchedulerBackend:54 - Application application_1534319660860_6126 has started running. 2018-09-05 12:53:07 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44057. 2018-09-05 12:53:07 INFO NettyBlockTransferService:54 - Server created on 192.168.131.102:44057 2018-09-05 12:53:07 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2018-09-05 12:53:07 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 192.168.131.102, 44057, None) 2018-09-05 12:53:07 INFO BlockManagerMasterEndpoint:54 - Registering block manager 192.168.131.102:44057 with 3.0 GB RAM, BlockManagerId(driver, 192.168.131.102, 44057, Non e) 2018-09-05 12:53:07 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 192.168.131.102, 44057, None) 2018-09-05 12:53:07 INFO BlockManager:54 - external shuffle service port = 7337 2018-09-05 12:53:07 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 192.168.131.102, 44057, None) 2018-09-05 12:53:07 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@76db9048{/metrics/json,null,AVAILABLE,@Spark} 2018-09-05 12:53:08 INFO EventLoggingListener:54 - Logging events to hdfs://ns2/user/spark/spark2ApplicationHistory/application_1534319660860_6126 2018-09-05 12:53:20 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.131.102:45028) with ID 1 2018-09-05 12:53:20 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.131.102:45026) with ID 3 2018-09-05 12:53:20 INFO BlockManagerMasterEndpoint:54 - Registering block manager bdpnode3.com:36665 with 5.2 GB RAM, BlockManagerId(1, bdpnode3.com, 36665 , None) 2018-09-05 12:53:20 INFO BlockManagerMasterEndpoint:54 - Registering block manager bdpnode3.com:34953 with 5.2 GB RAM, BlockManagerId(3, bdpnode3.com, 34953 , None) 2018-09-05 12:53:20 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.131.102:45027) with ID 2 2018-09-05 12:53:20 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.131.102:45030) with ID 4 2018-09-05 12:53:20 INFO BlockManagerMasterEndpoint:54 - Registering block manager bdpnode3.com:42569 with 5.2 GB RAM, BlockManagerId(2, bdpnode3.com, 42569 , None) 2018-09-05 12:53:20 INFO BlockManagerMasterEndpoint:54 - Registering block manager bdpnode3.com:51578 with 5.2 GB RAM, BlockManagerId(4, bdpnode3.com, 51578 , None) 2018-09-05 12:53:20 INFO YarnClientSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 1.0 2018-09-05 12:53:20 INFO Engine$:103 - Auto detect executor number and executor cores number 2018-09-05 12:53:20 INFO Engine$:105 - Executor number is 4 and executor cores number is 6 2018-09-05 12:53:20 WARN SparkContext:66 - Using an existing SparkContext; some configuration may not take effect. 2018-09-05 12:53:20 INFO Engine$:361 - Find existing spark context. Checking the spark conf... 2018-09-05 12:53:21 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 104.0 B, free 3.0 GB) 2018-09-05 12:53:22 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 132.0 B, free 3.0 GB) 2018-09-05 12:53:22 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 192.168.131.102:44057 (size: 132.0 B, free: 3.0 GB) 2018-09-05 12:53:22 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 88.0 B, free 3.0 GB) 2018-09-05 12:53:22 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 149.0 B, free 3.0 GB) 2018-09-05 12:53:22 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on 192.168.131.102:44057 (size: 149.0 B, free: 3.0 GB) 2018-09-05 12:53:22 INFO MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 48.0 B, free 3.0 GB) 2018-09-05 12:53:22 INFO MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 100.0 B, free 3.0 GB) 2018-09-05 12:53:22 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on 192.168.131.102:44057 (size: 100.0 B, free: 3.0 GB) 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_3 stored as values in memory (estimated size 104.0 B, free 3.0 GB) 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_3_piece0 stored as bytes in memory (estimated size 132.0 B, free 3.0 GB) 2018-09-05 12:53:23 INFO BlockManagerInfo:54 - Added broadcast_3_piece0 in memory on 192.168.131.102:44057 (size: 132.0 B, free: 3.0 GB) 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_4 stored as values in memory (estimated size 88.0 B, free 3.0 GB) 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_4_piece0 stored as bytes in memory (estimated size 149.0 B, free 3.0 GB) 2018-09-05 12:53:23 INFO BlockManagerInfo:54 - Added broadcast_4_piece0 in memory on 192.168.131.102:44057 (size: 149.0 B, free: 3.0 GB) 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_5 stored as values in memory (estimated size 48.0 B, free 3.0 GB) 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_5_piece0 stored as bytes in memory (estimated size 100.0 B, free 3.0 GB) 2018-09-05 12:53:23 INFO BlockManagerInfo:54 - Added broadcast_5_piece0 in memory on 192.168.131.102:44057 (size: 100.0 B, free: 3.0 GB) 2018-09-05 12:53:23 INFO DistriOptimizer$:895 - caching training rdd ... 2018-09-05 12:53:23 INFO DAGScheduler:54 - Registering RDD 1 (coalesce at DataSet.scala:344) 2018-09-05 12:53:23 INFO DAGScheduler:54 - Got job 0 (count at DataSet.scala:191) with 4 output partitions 2018-09-05 12:53:23 INFO DAGScheduler:54 - Final stage: ResultStage 1 (count at DataSet.scala:191) 2018-09-05 12:53:23 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0) 2018-09-05 12:53:23 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0) 2018-09-05 12:53:23 INFO DAGScheduler:54 - Submitting ShuffleMapStage 0 (MapPartitionsRDD[1] at coalesce at DataSet.scala:344), which has no missing parents 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_6 stored as values in memory (estimated size 3.1 KB, free 3.0 GB) 2018-09-05 12:53:23 INFO MemoryStore:54 - Block broadcast_6_piece0 stored as bytes in memory (estimated size 1955.0 B, free 3.0 GB) 2018-09-05 12:53:23 INFO BlockManagerInfo:54 - Added broadcast_6_piece0 in memory on 192.168.131.102:44057 (size: 1955.0 B, free: 3.0 GB) 2018-09-05 12:53:23 INFO DAGScheduler:54 - Submitting 4 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[1] at coalesce at DataSet.scala:344) (first 15 tasks are fo r partitions Vector(0, 1, 2, 3)) 2018-09-05 12:53:23 INFO YarnScheduler:54 - Adding task set 0.0 with 4 tasks 2018-09-05 12:53:24 WARN TaskSetManager:66 - Stage 0 contains a task of very large size (37908 KB). The maximum recommended task size is 100 KB. 2018-09-05 12:53:24 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, bdpnode3.com, executor 2, partition 0, PROCESS_LOCAL, 38818415 bytes) 2018-09-05 12:53:24 INFO TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, bdpnode3.com, executor 1, partition 1, PROCESS_LOCAL, 38818415 bytes) 2018-09-05 12:53:24 INFO TaskSetManager:54 - Starting task 2.0 in stage 0.0 (TID 2, bdpnode3.com, executor 4, partition 2, PROCESS_LOCAL, 38818415 bytes) 2018-09-05 12:53:24 INFO TaskSetManager:54 - Starting task 3.0 in stage 0.0 (TID 3, bdpnode3.com, executor 3, partition 3, PROCESS_LOCAL, 38818415 bytes) 2018-09-05 12:53:26 INFO BlockManagerInfo:54 - Added broadcast_6_piece0 in memory on bdpnode3.com:51578 (size: 1955.0 B, free: 5.2 GB) 2018-09-05 12:53:26 INFO BlockManagerInfo:54 - Added broadcast_6_piece0 in memory on bdpnode3.com:42569 (size: 1955.0 B, free: 5.2 GB) 2018-09-05 12:53:26 INFO BlockManagerInfo:54 - Added broadcast_6_piece0 in memory on bdpnode3.com:36665 (size: 1955.0 B, free: 5.2 GB) 2018-09-05 12:53:26 INFO BlockManagerInfo:54 - Added broadcast_6_piece0 in memory on bdpnode3.com:34953 (size: 1955.0 B, free: 5.2 GB) 2018-09-05 12:53:27 INFO TaskSetManager:54 - Finished task 2.0 in stage 0.0 (TID 2) in 3541 ms on bdpnode3.com (executor 4) (1/4) 2018-09-05 12:53:27 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 3860 ms on bdpnode3.com (executor 2) (2/4) 2018-09-05 12:53:28 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 3896 ms on bdpnode3.com (executor 1) (3/4) 2018-09-05 12:53:28 INFO TaskSetManager:54 - Finished task 3.0 in stage 0.0 (TID 3) in 3714 ms on bdpnode3.com (executor 3) (4/4) 2018-09-05 12:53:28 INFO YarnScheduler:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 2018-09-05 12:53:28 INFO DAGScheduler:54 - ShuffleMapStage 0 (coalesce at DataSet.scala:344) finished in 4.082 s 2018-09-05 12:53:28 INFO DAGScheduler:54 - looking for newly runnable stages 2018-09-05 12:53:28 INFO DAGScheduler:54 - running: Set() 2018-09-05 12:53:28 INFO DAGScheduler:54 - waiting: Set(ResultStage 1) 2018-09-05 12:53:28 INFO DAGScheduler:54 - failed: Set() 2018-09-05 12:53:28 INFO DAGScheduler:54 - Submitting ResultStage 1 (Cached Transformer MapPartitionsRDD[9] at mapPartitions at DataSet.scala:175), which has no missing p arents 2018-09-05 12:53:28 INFO MemoryStore:54 - Block broadcast_7 stored as values in memory (estimated size 4.9 KB, free 3.0 GB) 2018-09-05 12:53:28 INFO MemoryStore:54 - Block broadcast_7_piece0 stored as bytes in memory (estimated size 2.7 KB, free 3.0 GB) 2018-09-05 12:53:28 INFO BlockManagerInfo:54 - Added broadcast_7_piece0 in memory on 192.168.131.102:44057 (size: 2.7 KB, free: 3.0 GB) 2018-09-05 12:53:28 INFO DAGScheduler:54 - Submitting 4 missing tasks from ResultStage 1 (Cached Transformer MapPartitionsRDD[9] at mapPartitions at DataSet.scala:175) (f irst 15 tasks are for partitions Vector(0, 1, 2, 3)) 2018-09-05 12:53:28 INFO YarnScheduler:54 - Adding task set 1.0 with 4 tasks 2018-09-05 12:53:28 INFO TaskSetManager:54 - Starting task 0.0 in stage 1.0 (TID 4, bdpnode3.com, executor 4, partition 0, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:28 INFO TaskSetManager:54 - Starting task 1.0 in stage 1.0 (TID 5, bdpnode3.com, executor 2, partition 1, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:28 INFO TaskSetManager:54 - Starting task 2.0 in stage 1.0 (TID 6, bdpnode3.com, executor 1, partition 2, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:28 INFO TaskSetManager:54 - Starting task 3.0 in stage 1.0 (TID 7, bdpnode3.com, executor 3, partition 3, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:28 INFO BlockManagerInfo:54 - Added broadcast_7_piece0 in memory on bdpnode3.com:51578 (size: 2.7 KB, free: 5.2 GB) 2018-09-05 12:53:28 INFO BlockManagerInfo:54 - Added broadcast_7_piece0 in memory on bdpnode3.com:42569 (size: 2.7 KB, free: 5.2 GB) 2018-09-05 12:53:28 INFO BlockManagerInfo:54 - Added broadcast_7_piece0 in memory on bdpnode3.com:34953 (size: 2.7 KB, free: 5.2 GB) 2018-09-05 12:53:28 INFO BlockManagerInfo:54 - Added broadcast_7_piece0 in memory on bdpnode3.com:36665 (size: 2.7 KB, free: 5.2 GB) 2018-09-05 12:53:28 INFO MapOutputTrackerMasterEndpoint:54 - Asked to send map output locations for shuffle 0 to 192.168.131.102:45030 2018-09-05 12:53:28 INFO MapOutputTrackerMasterEndpoint:54 - Asked to send map output locations for shuffle 0 to 192.168.131.102:45026 2018-09-05 12:53:28 INFO MapOutputTrackerMaster:54 - Size of output statuses for shuffle 0 is 169 bytes 2018-09-05 12:53:28 INFO MapOutputTrackerMasterEndpoint:54 - Asked to send map output locations for shuffle 0 to 192.168.131.102:45027 2018-09-05 12:53:28 INFO MapOutputTrackerMasterEndpoint:54 - Asked to send map output locations for shuffle 0 to 192.168.131.102:45028 2018-09-05 12:53:28 INFO BlockManagerInfo:54 - Added rdd_5_3 in memory on bdpnode3.com:34953 (size: 37.2 MB, free: 5.1 GB) 2018-09-05 12:53:28 INFO BlockManagerInfo:54 - Added rdd_5_2 in memory on bdpnode3.com:36665 (size: 37.2 MB, free: 5.1 GB) 2018-09-05 12:53:29 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on bdpnode3.com:34953 (size: 100.0 B, free: 5.1 GB) 2018-09-05 12:53:29 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on bdpnode3.com:36665 (size: 100.0 B, free: 5.1 GB) 2018-09-05 12:53:29 INFO BlockManagerInfo:54 - Added rdd_9_3 in memory on bdpnode3.com:34953 (size: 48.0 B, free: 5.1 GB) 2018-09-05 12:53:29 INFO BlockManagerInfo:54 - Added rdd_5_0 in memory on bdpnode3.com:51578 (size: 37.2 MB, free: 5.1 GB) 2018-09-05 12:53:29 INFO BlockManagerInfo:54 - Added rdd_9_2 in memory on bdpnode3.com:36665 (size: 48.0 B, free: 5.1 GB) 2018-09-05 12:53:29 INFO TaskSetManager:54 - Finished task 3.0 in stage 1.0 (TID 7) in 1073 ms on bdpnode3.com (executor 3) (1/4) 2018-09-05 12:53:29 INFO BlockManagerInfo:54 - Added rdd_5_1 in memory on bdpnode3.com:42569 (size: 37.2 MB, free: 5.1 GB) 2018-09-05 12:53:29 INFO TaskSetManager:54 - Finished task 2.0 in stage 1.0 (TID 6) in 1482 ms on bdpnode3.com (executor 1) (2/4) 2018-09-05 12:53:29 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on bdpnode3.com:51578 (size: 100.0 B, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Added broadcast_2_piece0 in memory on bdpnode3.com:42569 (size: 100.0 B, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Added rdd_9_0 in memory on bdpnode3.com:51578 (size: 48.0 B, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Added rdd_9_1 in memory on bdpnode3.com:42569 (size: 48.0 B, free: 5.1 GB) 2018-09-05 12:53:30 INFO TaskSetManager:54 - Finished task 0.0 in stage 1.0 (TID 4) in 2061 ms on bdpnode3.com (executor 4) (3/4) 2018-09-05 12:53:30 INFO TaskSetManager:54 - Finished task 1.0 in stage 1.0 (TID 5) in 2059 ms on bdpnode3.com (executor 2) (4/4) 2018-09-05 12:53:30 INFO YarnScheduler:54 - Removed TaskSet 1.0, whose tasks have all completed, from pool 2018-09-05 12:53:30 INFO DAGScheduler:54 - ResultStage 1 (count at DataSet.scala:191) finished in 2.072 s 2018-09-05 12:53:30 INFO DAGScheduler:54 - Job 0 finished: count at DataSet.scala:191, took 6.384437 s 2018-09-05 12:53:30 INFO MemoryStore:54 - Block broadcast_8 stored as values in memory (estimated size 2.1 KB, free 3.0 GB) 2018-09-05 12:53:30 INFO MemoryStore:54 - Block broadcast_8_piece0 stored as bytes in memory (estimated size 635.0 B, free 3.0 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Added broadcast_8_piece0 in memory on 192.168.131.102:44057 (size: 635.0 B, free: 3.0 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_7_piece0 on 192.168.131.102:44057 in memory (size: 2.7 KB, free: 3.0 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_7_piece0 on bdpnode3.com:34953 in memory (size: 2.7 KB, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_7_piece0 on bdpnode3.com:36665 in memory (size: 2.7 KB, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_7_piece0 on bdpnode3.com:42569 in memory (size: 2.7 KB, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_7_piece0 on bdpnode3.com:51578 in memory (size: 2.7 KB, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_6_piece0 on 192.168.131.102:44057 in memory (size: 1955.0 B, free: 3.0 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_6_piece0 on bdpnode3.com:51578 in memory (size: 1955.0 B, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_6_piece0 on bdpnode3.com:36665 in memory (size: 1955.0 B, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_6_piece0 on bdpnode3.com:34953 in memory (size: 1955.0 B, free: 5.1 GB) 2018-09-05 12:53:30 INFO BlockManagerInfo:54 - Removed broadcast_6_piece0 on bdpnode3.com:42569 in memory (size: 1955.0 B, free: 5.1 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_9 stored as values in memory (estimated size 40.0 B, free 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_9_piece0 stored as bytes in memory (estimated size 46.0 B, free 3.0 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_9_piece0 in memory on 192.168.131.102:44057 (size: 46.0 B, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_10 stored as values in memory (estimated size 127.3 KB, free 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_10_piece0 stored as bytes in memory (estimated size 219.0 B, free 3.0 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_10_piece0 in memory on 192.168.131.102:44057 (size: 219.0 B, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11 stored as values in memory (estimated size 57.2 MB, free 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece0 stored as bytes in memory (estimated size 4.0 MB, free 3.0 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece0 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece1 stored as bytes in memory (estimated size 4.0 MB, free 3.0 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece1 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece2 stored as bytes in memory (estimated size 4.0 MB, free 3.0 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece2 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece3 stored as bytes in memory (estimated size 4.0 MB, free 3.0 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece3 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece4 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece4 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece5 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece5 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece6 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece6 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece7 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece7 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece8 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece8 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece9 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece9 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece10 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece10 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece11 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece11 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece12 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece12 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece13 stored as bytes in memory (estimated size 4.0 MB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece13 in memory on 192.168.131.102:44057 (size: 4.0 MB, free: 3.0 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_11_piece14 stored as bytes in memory (estimated size 1240.2 KB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_11_piece14 in memory on 192.168.131.102:44057 (size: 1240.2 KB, free: 3.0 GB) 2018-09-05 12:53:31 INFO DistriOptimizer$:672 - Cache thread models... 2018-09-05 12:53:31 INFO DAGScheduler:54 - Got job 1 (count at DistriOptimizer.scala:673) with 4 output partitions 2018-09-05 12:53:31 INFO DAGScheduler:54 - Final stage: ResultStage 3 (count at DistriOptimizer.scala:673) 2018-09-05 12:53:31 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 2) 2018-09-05 12:53:31 INFO DAGScheduler:54 - Missing parents: List() 2018-09-05 12:53:31 INFO DAGScheduler:54 - Submitting ResultStage 3 (Thread Model RDD MapPartitionsRDD[20] at mapPartitions at DistriOptimizer.scala:625), which has no mi ssing parents 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_12 stored as values in memory (estimated size 7.1 KB, free 2.9 GB) 2018-09-05 12:53:31 INFO MemoryStore:54 - Block broadcast_12_piece0 stored as bytes in memory (estimated size 3.9 KB, free 2.9 GB) 2018-09-05 12:53:31 INFO BlockManagerInfo:54 - Added broadcast_12_piece0 in memory on 192.168.131.102:44057 (size: 3.9 KB, free: 3.0 GB) 2018-09-05 12:53:31 INFO DAGScheduler:54 - Submitting 4 missing tasks from ResultStage 3 (Thread Model RDD MapPartitionsRDD[20] at mapPartitions at DistriOptimizer.scala: 625) (first 15 tasks are for partitions Vector(0, 1, 2, 3)) 2018-09-05 12:53:31 INFO YarnScheduler:54 - Adding task set 3.0 with 4 tasks 2018-09-05 12:53:31 INFO TaskSetManager:54 - Starting task 3.0 in stage 3.0 (TID 8, bdpnode3.com, executor 3, partition 3, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:31 INFO TaskSetManager:54 - Starting task 2.0 in stage 3.0 (TID 9, bdpnode3.com, executor 1, partition 2, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:31 INFO TaskSetManager:54 - Starting task 0.0 in stage 3.0 (TID 10, bdpnode3.com, executor 4, partition 0, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:31 INFO TaskSetManager:54 - Starting task 1.0 in stage 3.0 (TID 11, bdpnode3.com, executor 2, partition 1, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_12_piece0 in memory on bdpnode3.com:51578 (size: 3.9 KB, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_12_piece0 in memory on bdpnode3.com:42569 (size: 3.9 KB, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_12_piece0 in memory on bdpnode3.com:36665 (size: 3.9 KB, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_12_piece0 in memory on bdpnode3.com:34953 (size: 3.9 KB, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_8_piece0 in memory on bdpnode3.com:51578 (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_8_piece0 in memory on bdpnode3.com:34953 (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_8_piece0 in memory on bdpnode3.com:42569 (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_8_piece0 in memory on bdpnode3.com:36665 (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_10_piece0 in memory on bdpnode3.com:51578 (size: 219.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_10_piece0 in memory on bdpnode3.com:42569 (size: 219.0 B, free: 5.1 GB) 2018-09-05 12:53:32 WARN TaskSetManager:66 - Lost task 0.0 in stage 3.0 (TID 10, bdpnode3.com, executor 4): java.lang.NullPointerException at com.intel.analytics.bigdl.models.utils.ModelBroadcast.value(ModelBroadcast.scala:120) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13$$anonfun$14.apply(DistriOptimizer.scala:643) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13$$anonfun$14.apply(DistriOptimizer.scala:642) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13.apply(DistriOptimizer.scala:642) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13.apply(DistriOptimizer.scala:625) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

2018-09-05 12:53:32 INFO TaskSetManager:54 - Starting task 0.1 in stage 3.0 (TID 12, bdpnode3.com, executor 4, partition 0, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_10_piece0 in memory on bdpnode3.com:36665 (size: 219.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 1.0 in stage 3.0 (TID 11) on bdpnode3.com, executor 2: java.lang.NullPointerException (null) [duplicate 1] 2018-09-05 12:53:32 INFO TaskSetManager:54 - Starting task 1.1 in stage 3.0 (TID 13, bdpnode3.com, executor 2, partition 1, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 0.1 in stage 3.0 (TID 12) on bdpnode3.com, executor 4: java.lang.NullPointerException (null) [duplicate 2] 2018-09-05 12:53:32 INFO TaskSetManager:54 - Starting task 0.2 in stage 3.0 (TID 14, bdpnode3.com, executor 4, partition 0, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Added broadcast_10_piece0 in memory on bdpnode3.com:34953 (size: 219.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 1.1 in stage 3.0 (TID 13) on bdpnode3.com, executor 2: java.lang.NullPointerException (null) [duplicate 3] 2018-09-05 12:53:32 INFO TaskSetManager:54 - Starting task 1.2 in stage 3.0 (TID 15, bdpnode3.com, executor 2, partition 1, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 0.2 in stage 3.0 (TID 14) on bdpnode3.com, executor 4: java.lang.NullPointerException (null) [duplicate 4] 2018-09-05 12:53:32 INFO TaskSetManager:54 - Starting task 0.3 in stage 3.0 (TID 16, bdpnode3.com, executor 4, partition 0, PROCESS_LOCAL, 4908 bytes) 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 0.3 in stage 3.0 (TID 16) on bdpnode3.com, executor 4: java.lang.NullPointerException (null) [duplicate 5] 2018-09-05 12:53:32 ERROR TaskSetManager:70 - Task 0 in stage 3.0 failed 4 times; aborting job 2018-09-05 12:53:32 INFO YarnScheduler:54 - Cancelling stage 3 2018-09-05 12:53:32 INFO YarnScheduler:54 - Stage 3 was cancelled 2018-09-05 12:53:32 INFO DAGScheduler:54 - ResultStage 3 (count at DistriOptimizer.scala:673) failed in 0.735 s due to Job aborted due to stage failure: Task 0 in stage 3 .0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 16, bdpnode3.com, executor 4): java.lang.NullPointerException at com.intel.analytics.bigdl.models.utils.ModelBroadcast.value(ModelBroadcast.scala:120) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13$$anonfun$14.apply(DistriOptimizer.scala:643) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13$$anonfun$14.apply(DistriOptimizer.scala:642) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13.apply(DistriOptimizer.scala:642) at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$13.apply(DistriOptimizer.scala:625) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: 2018-09-05 12:53:32 INFO DAGScheduler:54 - Job 1 failed: count at DistriOptimizer.scala:673, took 0.773681 s 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 3.0 in stage 3.0 (TID 8) on bdpnode3.com, executor 3: java.lang.NullPointerException (null) [duplicate 6] 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 2.0 in stage 3.0 (TID 9) on bdpnode3.com, executor 1: java.lang.NullPointerException (null) [duplicate 7] 2018-09-05 12:53:32 INFO YarnScheduler:54 - Removed TaskSet 3.0, whose tasks have all completed, from pool 2018-09-05 12:53:32 INFO TaskSetManager:54 - Lost task 1.2 in stage 3.0 (TID 15) on bdpnode3.com, executor 2: java.lang.NullPointerException (null) [duplicate 8] 2018-09-05 12:53:32 INFO YarnScheduler:54 - Removed TaskSet 3.0, whose tasks have all completed, from pool 2018-09-05 12:53:32 INFO AbstractConnector:310 - Stopped Spark@5403e739{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Removed broadcast_8_piece0 on 192.168.131.102:44057 in memory (size: 635.0 B, free: 3.0 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Removed broadcast_8_piece0 on bdpnode3.com:36665 in memory (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Removed broadcast_8_piece0 on bdpnode3.com:42569 in memory (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Removed broadcast_8_piece0 on bdpnode3.com:51578 in memory (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO BlockManagerInfo:54 - Removed broadcast_8_piece0 on bdpnode3.com:34953 in memory (size: 635.0 B, free: 5.1 GB) 2018-09-05 12:53:32 INFO SparkUI:54 - Stopped Spark web UI at http://192.168.131.102:4040 2018-09-05 12:53:32 INFO YarnClientSchedulerBackend:54 - Interrupting monitor thread 2018-09-05 12:53:32 INFO YarnClientSchedulerBackend:54 - Shutting down all executors 2018-09-05 12:53:32 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Asking each executor to shut down 2018-09-05 12:53:32 INFO SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 2018-09-05 12:53:32 INFO YarnClientSchedulerBackend:54 - Stopped 2018-09-05 12:53:32 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2018-09-05 12:53:32 INFO MemoryStore:54 - MemoryStore cleared 2018-09-05 12:53:32 INFO BlockManager:54 - BlockManager stopped 2018-09-05 12:53:32 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2018-09-05 12:53:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2018-09-05 12:53:33 INFO ShutdownHookManager:54 - Shutdown hook called 2018-09-05 12:53:33 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-fedd8d75-5a1c-4761-a90c-bab88509f608

yiheng commented 6 years ago

There's a work around in https://github.com/intel-analytics/BigDL/issues/2628.

Hi, It's due to the fix of ModelBroadcast is not compatible with KryoSerializer. A workaround is adding --conf "spark.serializer=org.apache.spark.serializer.JavaSerializer" to command.

wzxwf commented 6 years ago

Thanks

dinkleva commented 2 years ago

Screenshot (3)

ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 8) java.io.IOException: Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified....

Can anybody help me to solve this issue ?

hkvision commented 2 years ago

Hi @dinkleva

Doubt whether it is due to your pyspark installation (probably on Windows?). Since it is a pure spark problem, not related to BigDL, probably you can raise this issue in the Spark community?

preet2206 commented 2 years ago

I am facing the same issue.. any other work around?

hkvision commented 2 years ago

Hi @preet2206

If you are also facing this issue related to pyspark, then you are highly recommended to raise it in the spark community and I believe they can support you better.