Open tfiasco opened 7 years ago
I reconsidered this problem, and I'm not sure if the failed status is the result of failure in deleting temp directory or not.
This is the output of console:
test1@Master:/usr/lib/sparkclr$ sparkclr-submit.sh --master yarn --deploy-mode cluster --exe SparkClrPi.exe /usr/lib/sparkclr/examples/Batch/pi/
[sparkclr-submit.sh] LAUNCH_CLASSPATH=/usr/lib/sparkclr/runtime/lib/spark-clr_2.11-2.0.200.jar:/usr/local/spark-2.0.2-bin-hadoop2.7/jars/*
Zip driver directory /usr/lib/sparkclr/examples/Batch/pi to /tmp/pi_1490072786367.zip
[sparkclr-submit.sh] Command to run --master yarn --deploy-mode cluster --name SparkClrPi --files /tmp/pi_1490072786367.zip --class org.apache.spark.deploy.csharp.CSharpRunner /usr/lib/sparkclr/runtime/lib/spark-clr_2.11-2.0.200.jar pi_1490072786367.zip SparkClrPi.exe
17/03/20 22:06:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/20 22:06:28 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.169.141:8032
17/03/20 22:06:28 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
17/03/20 22:06:28 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/03/20 22:06:28 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
17/03/20 22:06:28 INFO yarn.Client: Setting up container launch context for our AM
17/03/20 22:06:28 INFO yarn.Client: Setting up the launch environment for our AM container
17/03/20 22:06:28 INFO yarn.Client: Preparing resources for our AM container
17/03/20 22:06:28 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://Master:9000/spark-jars/JavaEWAH-0.3.2.jar
17/03/20 22:06:28 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://Master:9000/spark-jars/RoaringBitmap-0.5.11.jar
17/03/20 22:06:29 INFO yarn.Client: Uploading resource file:/usr/lib/sparkclr/runtime/lib/spark-clr_2.11-2.0.200.jar -> hdfs://Master:9000/user/test1/.sparkStaging/application_1490070587984_0007/spark-clr_2.11-2.0.200.jar
17/03/20 22:06:29 INFO yarn.Client: Uploading resource file:/tmp/pi_1490072786367.zip -> hdfs://Master:9000/user/test1/.sparkStaging/application_1490070587984_0007/pi_1490072786367.zip
17/03/20 22:06:29 INFO yarn.Client: Uploading resource file:/tmp/spark-a244ed7e-5265-4a80-a97e-7cd4a83425c6/__spark_conf__4813657159454156389.zip -> hdfs://Master:9000/user/test1/.sparkStaging/application_1490070587984_0007/__spark_conf__.zip
17/03/20 22:06:29 INFO spark.SecurityManager: Changing view acls to: test1
17/03/20 22:06:29 INFO spark.SecurityManager: Changing modify acls to: test1
17/03/20 22:06:29 INFO spark.SecurityManager: Changing view acls groups to:
17/03/20 22:06:29 INFO spark.SecurityManager: Changing modify acls groups to:
17/03/20 22:06:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test1); groups with view permissions: Set(); users with modify permissions: Set(test1); groups with modify permissions: Set()
17/03/20 22:06:29 INFO yarn.Client: Submitting application application_1490070587984_0007 to ResourceManager
17/03/20 22:06:29 INFO impl.YarnClientImpl: Submitted application application_1490070587984_0007
17/03/20 22:06:30 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:30 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1490072789938
final status: UNDEFINED
tracking URL: http://Master:8088/proxy/application_1490070587984_0007/
user: test1
17/03/20 22:06:31 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:32 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:33 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:34 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:34 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.169.129
ApplicationMaster RPC port: 0
queue: default
start time: 1490072789938
final status: UNDEFINED
tracking URL: http://Master:8088/proxy/application_1490070587984_0007/
user: test1
17/03/20 22:06:35 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:36 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:37 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:38 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:39 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:40 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:41 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:42 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:43 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:45 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:46 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:47 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:47 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1490072789938
final status: UNDEFINED
tracking URL: http://Master:8088/proxy/application_1490070587984_0007/
user: test1
17/03/20 22:06:48 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:49 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:50 INFO yarn.Client: Application report for application_1490070587984_0007 (state: ACCEPTED)
17/03/20 22:06:51 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:51 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.169.128
ApplicationMaster RPC port: 0
queue: default
start time: 1490072789938
final status: UNDEFINED
tracking URL: http://Master:8088/proxy/application_1490070587984_0007/
user: test1
17/03/20 22:06:52 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:53 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:54 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:55 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:56 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:57 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:58 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:06:59 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:07:00 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:07:01 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:07:02 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:07:03 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:07:04 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:07:05 INFO yarn.Client: Application report for application_1490070587984_0007 (state: RUNNING)
17/03/20 22:07:06 INFO yarn.Client: Application report for application_1490070587984_0007 (state: FINISHED)
17/03/20 22:07:06 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.169.128
ApplicationMaster RPC port: 0
queue: default
start time: 1490072789938
final status: FAILED
tracking URL: http://Master:8088/proxy/application_1490070587984_0007/
user: test1
Exception in thread "main" org.apache.spark.SparkException: Application application_1490070587984_0007 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/03/20 22:07:06 INFO util.ShutdownHookManager: Shutdown hook called
17/03/20 22:07:06 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a244ed7e-5265-4a80-a97e-7cd4a83425c6
The stdout
of node:
[2017-03-20 22:06:34,619] [1] [INFO ] [Microsoft.Spark.CSharp.Configuration.ConfigurationService] - ConfigurationService runMode is YARN
[2017-03-20 22:06:34,627] [1] [INFO ] [Microsoft.Spark.CSharp.Configuration.ConfigurationService+SparkCLRConfiguration] - CSharpBackend successfully read from environment variable CSHARPBACKEND_PORT
[2017-03-20 22:06:34,629] [1] [INFO ] [Microsoft.Spark.CSharp.Proxy.Ipc.SparkCLRIpcProxy] - CSharpBackend port number to be used in JvMBridge is 43932
[2017-03-20 22:06:40,973] [1] [INFO ] [Microsoft.Spark.CSharp.Core.SparkContext] - Parallelizing 300001 items to form RDD in the cluster with 3 partitions
[2017-03-20 22:06:42,496] [1] [INFO ] [Microsoft.Spark.CSharp.Core.RDD`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]] - Executing Map operation on RDD (preservesPartitioning=False)
[2017-03-20 22:06:42,498] [1] [INFO ] [Microsoft.Spark.CSharp.Core.RDD`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]] - Executing Reduce operation on RDD
[2017-03-20 22:06:45,675] [1] [INFO ] [Microsoft.Spark.CSharp.Examples.PiExample] - (anonymous method approach) Pi is roughly 3.12157333333333.
[2017-03-20 22:06:45,676] [1] [INFO ] [Microsoft.Spark.CSharp.Core.RDD`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]] - Executing Map operation on RDD (preservesPartitioning=False)
[2017-03-20 22:06:45,676] [1] [INFO ] [Microsoft.Spark.CSharp.Core.RDD`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]] - Executing Reduce operation on RDD
[2017-03-20 22:06:47,161] [1] [INFO ] [Microsoft.Spark.CSharp.Examples.PiExample] - (serialized class approach) Pi is roughly 3.13689333333333.
[2017-03-20 22:06:47,161] [1] [INFO ] [Microsoft.Spark.CSharp.Examples.PiExample] - Completed calculating the value of Pi
[2017-03-20 22:06:47,161] [1] [INFO ] [Microsoft.Spark.CSharp.Core.SparkContext] - Stopping SparkContext
[2017-03-20 22:06:47,161] [1] [INFO ] [Microsoft.Spark.CSharp.Core.SparkContext] - Note that there might be error in Spark logs on the failure to delete userFiles directory under Spark temp directory (spark.local.dir config value in local mode)
[2017-03-20 22:06:47,161] [1] [INFO ] [Microsoft.Spark.CSharp.Core.SparkContext] - This error may be ignored for now. See https://issues.apache.org/jira/browse/SPARK-8333 for details
The last several lines of stderr
:
17/03/20 22:06:47 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.169.129:36353
17/03/20 22:06:47 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
17/03/20 22:06:47 INFO yarn.YarnAllocator: Canceling requests for 0 executor container(s) to have a new desired total 0 executors.
17/03/20 22:06:47 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
17/03/20 22:06:47 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
17/03/20 22:06:47 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/03/20 22:06:47 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
17/03/20 22:06:47 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/20 22:06:47 INFO memory.MemoryStore: MemoryStore cleared
17/03/20 22:06:47 INFO storage.BlockManager: BlockManager stopped
17/03/20 22:06:47 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/03/20 22:06:47 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/20 22:06:47 INFO spark.SparkContext: Successfully stopped SparkContext
17/03/20 22:06:47 INFO csharp.CSharpBackend: Requesting to close all call back sockets.
17/03/20 22:06:47 INFO csharp.CSharpRunner: Closing CSharpBackend
17/03/20 22:06:47 INFO csharp.CSharpBackend: Requesting to close all call back sockets.
17/03/20 22:06:47 INFO csharp.CSharpRunner: Return CSharpBackend code 0
17/03/20 22:06:47 INFO csharp.Utils: Utils.exit() with status: 0, maxDelayMillis: 1000
17/03/20 22:06:47 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/03/20 22:06:47 INFO util.ShutdownHookManager: Shutdown hook called
17/03/20 22:06:47 INFO util.ShutdownHookManager: Deleting directory /usr/local/hadoop-2.7.3/tmp/nm-local-dir/usercache/test1/appcache/application_1490070587984_0007/spark-56003f30-fe1d-454f-ac80-4d372d428beb
In stderr, I see Driver requested a total number of 0 executor(s).
. Try using --num-executors
parameter in sparkclr-submit.
I used the command
sparkclr-submit.sh --master yarn --deploy-mode cluster --num-executors 4 --executor-cores 2 --executor-memory 1G --conf spark.speculation=true --driver-memory 1G --exe SparkClrPi.exe /usr/lib/sparkclr/examples/Batch/pi/
The status is still FAILED, and stderr is still Driver requested a total number of 0 executor(s).
Are you able to successfully run Scala Pi example in your YARN cluster?
Yes, I tried
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster ./examples/jars/spark-examples_2.11-2.0.2.jar 10
which returned a succeeded final status.
Do you see anything in stderr file when running Scala example?
The last lines in stderr of Scala
example are:
17/03/21 16:01:01 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 3.468 s
17/03/21 16:01:01 INFO cluster.YarnClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/03/21 16:01:01 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 4.293689 s
17/03/21 16:01:01 INFO server.ServerConnector: Stopped ServerConnector@7a6b222f{HTTP/1.1}{0.0.0.0:0}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@181edc4f{/stages/stage/kill,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7b9d9a6a{/api,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@204b4ba5{/,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cbb79a2{/static,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3debd17f{/executors/threadDump/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@36cdb80e{/executors/threadDump,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@30e397ff{/executors/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3490ae3e{/executors,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5f3a82fa{/environment/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@39a7f1fc{/environment,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1c73109b{/storage/rdd/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@75b3de55{/storage/rdd,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7142e73d{/storage/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@53eec3fa{/storage,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66a1e111{/stages/pool/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2a65c212{/stages/pool,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@758af09a{/stages/stage/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7d190c6b{/stages/stage,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@11bbb671{/stages/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7215b4a5{/stages,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@59da282a{/jobs/job/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2d382edb{/jobs/job,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@780369c7{/jobs/json,null,UNAVAILABLE}
17/03/21 16:01:01 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7d3adbd7{/jobs,null,UNAVAILABLE}
17/03/21 16:01:01 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.169.128:42487
17/03/21 16:01:01 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
17/03/21 16:01:01 INFO yarn.YarnAllocator: Canceling requests for 0 executor container(s) to have a new desired total 0 executors.
17/03/21 16:01:01 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
17/03/21 16:01:01 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
17/03/21 16:01:01 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/03/21 16:01:01 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
17/03/21 16:01:01 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/21 16:01:01 INFO memory.MemoryStore: MemoryStore cleared
17/03/21 16:01:01 INFO storage.BlockManager: BlockManager stopped
17/03/21 16:01:01 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/03/21 16:01:01 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/21 16:01:01 INFO spark.SparkContext: Successfully stopped SparkContext
17/03/21 16:01:01 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
17/03/21 16:01:01 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
17/03/21 16:01:01 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
17/03/21 16:01:01 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://Master:9000/user/test1/.sparkStaging/application_1490137059379_0001
17/03/21 16:01:01 INFO util.ShutdownHookManager: Shutdown hook called
17/03/21 16:01:01 INFO util.ShutdownHookManager: Deleting directory /usr/local/hadoop-2.7.3/tmp/nm-local-dir/usercache/test1/appcache/application_1490137059379_0001/spark-3f36dc68-1ebe-4e3c-a868-c7b7cdf26e44
The last lines in stderr of Mobius
example are:
17/03/21 16:03:55 INFO scheduler.DAGScheduler: ResultStage 1 (collect at PythonRDD.scala:453) finished in 2.275 s
17/03/21 16:03:55 INFO scheduler.DAGScheduler: Job 1 finished: collect at PythonRDD.scala:453, took 2.279892 s
17/03/21 16:03:55 INFO server.ServerConnector: Stopped ServerConnector@5d3a199f{HTTP/1.1}{0.0.0.0:0}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@60c3701b{/stages/stage/kill,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4c18de8d{/api,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@9458eb2{/,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3bb2acf{/static,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@273168d5{/executors/threadDump/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5452294e{/executors/threadDump,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1a7200da{/executors/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@56ee4814{/executors,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@34e1bda{/environment/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f4e341e{/environment,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@68049a81{/storage/rdd/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5881be91{/storage/rdd,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@565d4c29{/storage/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@162836a1{/storage,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1ae55a0a{/stages/pool/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@d407249{/stages/pool,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2edddd10{/stages/stage/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1373f649{/stages/stage,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7bed9bf0{/stages/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@569eb406{/stages,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@787429d{/jobs/job/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cd48440{/jobs/job,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2d800bde{/jobs/json,null,UNAVAILABLE}
17/03/21 16:03:55 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21fe16d5{/jobs,null,UNAVAILABLE}
17/03/21 16:03:55 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.169.129:36122
17/03/21 16:03:56 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
17/03/21 16:03:56 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
17/03/21 16:03:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/03/21 16:03:56 INFO yarn.YarnAllocator: Canceling requests for 0 executor container(s) to have a new desired total 0 executors.
17/03/21 16:03:56 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
17/03/21 16:03:56 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
17/03/21 16:03:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/21 16:03:56 INFO memory.MemoryStore: MemoryStore cleared
17/03/21 16:03:56 INFO storage.BlockManager: BlockManager stopped
17/03/21 16:03:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/03/21 16:03:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/21 16:03:56 INFO spark.SparkContext: Successfully stopped SparkContext
17/03/21 16:03:56 INFO csharp.CSharpBackend: Requesting to close all call back sockets.
17/03/21 16:03:56 INFO csharp.CSharpRunner: Closing CSharpBackend
17/03/21 16:03:56 INFO csharp.CSharpBackend: Requesting to close all call back sockets.
17/03/21 16:03:56 INFO csharp.CSharpRunner: Return CSharpBackend code 0
17/03/21 16:03:56 INFO csharp.Utils: Utils.exit() with status: 0, maxDelayMillis: 1000
17/03/21 16:03:56 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/03/21 16:03:56 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Shutdown hook called before final status was reported.)
17/03/21 16:03:56 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
17/03/21 16:03:56 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://Master:9000/user/test1/.sparkStaging/application_1490137059379_0002
17/03/21 16:03:57 INFO util.ShutdownHookManager: Shutdown hook called
17/03/21 16:03:57 INFO util.ShutdownHookManager: Deleting directory /usr/local/hadoop-2.7.3/tmp/nm-local-dir/usercache/test1/appcache/application_1490137059379_0002/spark-066969d3-fcf9-44b3-8d3e-11801345971b
Looks like some exception is thrown after successfully calculating Pi. You might be able to find the exception details in YARN logs. You could also try running the app in client mode to get the exception details in your command prompt.
The SparkClrPi example returns a SUCCEEDED final status in yarn client mode, but in yarn cluster mode it is FAILED.
I guess the timeout to exit app master gracefully is not enough.
If you do not mind compiling Mobius code, you could increase the timeout from 1 second to something like 5 or 10 seconds and build Mobius jar file to use. This will most likely solve the problem.
Did this solve the problem and is it in the latest? I am getting the same error
U need to explicitly call spark context.stop(). By default shutdownhookmanager will stop. If it is exceedingly 10s. Interrupted exception will throw eve though if job succeeded before
Hello, I'm running a Mobius application in yarn cluster mode. When the application is nearly finished, an error occurs in Spark logs on the failure to delete userFiles directory under Spark temp directory. Mobius tells me the error may be ignored for now, but the error reports a failed status to yarn cluster (which is actully successful). In consequence, yarn starts another node to run this job, and this job failed either. I'm running the application in a cluster with two nodes, so the application is forced to run twice by yarn. How can I avoid the error in deleting temp directory?
Thanks very much.