intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
11 stars 3 forks source link

Zoo on Databricks always use 1 executor #89

Closed jack1981 closed 2 years ago

jack1981 commented 2 years ago

We detected one issue ,when we run Zoo at Databricks cluster ( Standard cluster) , Zoo code always run at single executor considering the spark.executor.instances been set more than 1 , for example 6 sparkConf = init_spark_conf(conf={"spark.executorEnv.TF_DISABLE_MKL": "1"}) sc = init_nncontext(sparkConf) spark = SparkSession \ .builder \ .appName(app_name) \ .getOrCreate()

Executor ID Address Status RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Logs Thread Dump Heap Histogram
0 ****:43651 Active 4 29.5 GiB / 103.8 GiB 0.0 B 16 0 0 417 417 1.4 h (2.2 min) 5.5 TiB 4.3 GiB 4.3 GiB stdoutstderr Thread Dump Heap Histogram
driver ****:46647 Active 0 4.6 MiB / 111.3 GiB 0.0 B 0 0 0 0 0 0.0 ms (0.0 ms) 0.0 B 0.0 B 0.0 B   Thread Dump Heap Histogram
jenniew commented 2 years ago

We have verified zoo application can run on multiple executors on Databricks. Asked Mastercard to verify their configuration.