Closed alexnaspo closed 6 years ago
+1
+1
I'm porting the jupyter/docker-stacks from ubuntu+spark+mesos to apline+spark+consul+nomad and upstream use Spark 2.3: https://github.com/jupyter/docker-stacks/blob/master/pyspark-notebook/Dockerfile#L10
It would be great to have end user code that just-works
on the two stack implementations
I spent some time looking into a rebase on upstream spark 2.3.0
and, while I'm not done make sure all is fully validated, I got a SparkPi
job to complete successfully:
spark-submit --class org.apache.spark.examples.JavaSparkPi --master nomad --deploy-mode cluster --conf spark.executor.instances=4 --conf spark.nomad.cluster.monitorUntil=complete --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark-alex/spark-2.3.0-bin-nomad.tgz https://s3.amazonaws.com/nomad-spark-alex/spark-examples_2.11-2.3.0.jar 100
...
18/05/21 18:42:27 INFO JobUtils: stderr: 18/05/21 18:42:26 INFO TaskSetManager: Finished task 99.0 in stage 0.0 (TID 99) in 96 ms on 172.31.26.218 (executor 38fbeeb6-9d23-7528-46d9-7942004c0ec4-1526928127730) (100/100)
18/05/21 18:42:27 INFO JobUtils: stderr: 18/05/21 18:42:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/05/21 18:42:27 INFO JobUtils: stderr: 18/05/21 18:42:26 INFO DAGScheduler: ResultStage 0 (reduce at JavaSparkPi.java:54) finished in 6.661 s
18/05/21 18:42:27 INFO JobUtils: stderr: 18/05/21 18:42:26 INFO DAGScheduler: Job 0 finished: reduce at JavaSparkPi.java:54, took 6.741448 s
18/05/21 18:42:27 INFO JobUtils: stderr: 18/05/21 18:42:26 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-28-147.node.dc1.consul:22453
18/05/21 18:42:27 INFO JobUtils: stderr: 18/05/21 18:42:27 INFO NomadJobManipulator: Registered Nomad job org.apache.spark.examples.JavaSparkPi-2018-05-21T18:41:24.557Z (job modify index 154 -> 164)
18/05/21 18:42:27 INFO JobUtils: driver Terminated -- Exit status 0
18/05/21 18:42:27 INFO JobUtils: Allocation 0510ad22-8d0c-ccd6-4d20-d61db8a7e1b1 has client status complete
18/05/21 18:42:27 INFO NomadClusterModeLauncher: Driver completed successfully
Note that I had to upgrade the Jackson version too in order to get past issue #3 (which should probably be done separately prior to this).
A few notes:
nomad-spark
commit on top of the v2.3.0 tag and resolving conflicts. README
, it would be much better to avoid managing the long term fork entirely ("The ultimate goal is to integrate Nomad into Spark directly, either natively or via a backend/scheduler plugin interface"
). Spark
already has interfaces for backends but there's still a requirement for all resource managers to be included in core
. It might be worthwhile to contribute back to Spark
and make it easier to develop resource managers that are kept externally to the project. There's still some validation that needs to be done to make sure the 2.3.0
package is correct but if anyone is brave enough, here are the unofficial (and potentially non-final) packages built from my forked branch:
Lastly, here's a link to a successful build on travis
: https://travis-ci.org/alexandre-normand/nomad-spark/builds/381945123. It might be a good idea to enable travis
on https://github.com/hashicorp/nomad-spark too. Note that the configuration uses -DskipTests
which isn't great for a complete continuous integration but it would be a good starting point.
There is now a release for v2.3.0: https://github.com/hashicorp/nomad-spark/releases/tag/v2.3.0-nomad-0.7.0-20180522
http://spark.apache.org/releases/spark-release-2-3-0.html
Jet.com would like to run 2.3.0 on Nomad