oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 77 forks source link

Cannot pass stability test when running TPC-DS for 5 rounds with 1.5TB data scale. #232

Open haojinIntel opened 3 years ago

haojinIntel commented 3 years ago

We triggered stability test for native-sql-engine and run TPC-DS for 5 rounds. The cluster contains 3 workers and each has 512GB DRAM. The configuration of spark-defaults.conf are showed below:

spark.sql.join.preferSortMergeJoin false spark.yarn.historyServer.address vsr215:18080 spark.sql.broadcastTimeout 3600 spark.executor.memoryOverhead 3652 spark.dynamicAllocation.executorIdleTimeout 3600s spark.master yarn spark.sql.autoBroadcastJoinThreshold 31457280 spark.kryoserializer.buffer.max 256m spark.executor.memory 13g spark.deploy-mode client spark.eventLog.dir hdfs://vsr215:9000/spark-history-server spark.executor.cores 4 spark.memory.offHeap.size 30g spark.sql.adaptive.enabled true spark.driver.memory 20g spark.network.timeout 3600s spark.oap.sql.columnar.sortmergejoin true spark.memory.offHeap.enabled false spark.eventLog.enabled true spark.executor.instances 96 spark.sql.inMemoryColumnarStorage.batchSize 20480 spark.driver.extraClassPath /opt/Beaver/OAP/oap_jar/spark-columnar-core-1.1.0-jar-with-dependencies.jar:/opt/Beaver/OAP/oap_jar/spark-arrow-datasource-standard-1.1.0-jar-with-dependencies.jar spark.driver.maxResultSize 15g spark.sql.sources.useV1SourceList avro spark.history.fs.logDirectory hdfs://vsr215:9000/spark-history-server spark.sql.extensions com.intel.oap.ColumnarPlugin spark.executor.extraClassPath /opt/Beaver/OAP/oap_jar/spark-columnar-core-1.1.0-jar-with-dependencies.jar:/opt/Beaver/OAP/oap_jar/spark-arrow-datasource-standard-1.1.0-jar-with-dependencies.jar spark.history.fs.cleaner.enabled true spark.sql.columnar.window true spark.sql.columnar.sort true spark.sql.execution.arrow.maxRecordsPerBatch 20480 spark.kryoserializer.buffer 64m spark.sql.shuffle.partitions 288 spark.history.ui.port 18080 spark.sql.parquet.columnarReaderBatchSize 20480 spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager spark.serializer org.apache.spark.serializer.KryoSerializer spark.authenticate false spark.sql.columnar.codegen.hashAggregate false spark.sql.warehouse.dir hdfs://vsr215:9000/spark-warehouse

haojinIntel commented 3 years ago

After tuning the parameters of spark, we can successfully run TPC-DS for 10 rounds. Modify the following parameters: spark.executor.memory 6g spark.executor.cores 8 spark.executor.instances 36 spark.sql.autoBroadcastJoinThreshold 5M spark.oap.sql.columnar.numaBinding true spark.oap.sql.columnar.coreRange 0-23,48-71|24-47,72-95