hashicorp / nomad-spark

DEPRECATED: Apache Spark with native support for Nomad as a scheduler
44 stars 16 forks source link

Executors do not scale down in dynamic allocation #25

Open lukleh opened 5 years ago

lukleh commented 5 years ago

Nomad 0.9.1
Pyspark 2.4.3

example pyspark-shell command running against Nomad cluster:

    --conf spark.nomad.sparkDistribution=local:/usr/lib/spark \
    --conf spark.dynamicAllocation.enabled=true \
    --conf spark.shuffle.service.enabled=true \
    --conf spark.dynamicAllocation.minExecutors=1 \
    --conf spark.dynamicAllocation.maxExecutors=30

In pyspark-shell load the following:

After spark.dynamicAllocation.executorIdleTimeout executors do not get killed and the following logs appear instead:

19/07/28 13:01:05 WARN NomadClusterSchedulerBackend: Ignoring request to kill 15 executor(s): {{EXECUTORS IDS SKIPPED}} (not yet implemented for Nomad)  
19/07/28 13:01:05 WARN ExecutorAllocationManager: Unable to reach the cluster manager to kill executor/s {{EXECUTORS IDS SKIPPED}} or no executor eligible to kill!

which is poiting to https://github.com/hashicorp/nomad-spark/blob/nomad-spark-2.4.3/resource-managers/nomad/src/main/scala/org/apache/spark/scheduler/cluster/nomad/NomadClusterSchedulerBackend.scala#L188 missing implementation

similar issue https://github.com/hashicorp/nomad-spark/issues/20

cgbaker commented 5 years ago

Hi @lukleh , the Nomad Spark integration does not implement downscaling when using dynamic executors. Upcoming features road-mapped for Nomad 0.10.x will allow us to decouple the shuffle service processes from the executor processes and support proper downscaling. I'll leave this issue open to track that.