oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 77 forks source link

Cannot get benefit comparing with vanilla spark when running TPC-DS 1.5TB power test. #250

Open haojinIntel opened 3 years ago

haojinIntel commented 3 years ago

We created 1.5TB non partitioned data and run all queries of TPC-DS. The results show native-sql-engine has near 10% degradation than vanilla spark. And q72.sql has the largest gap.

zhouyuan commented 3 years ago

@haojinIntel

the issue may due to preferSortMergeJoin=false used and Spark choose to use ShuffledhashJoin, which is much slower than SMJ on Q72. Turn this join into SMJ may bring better performance on your dataset.

note: native sql relies on preferSMJ = false as of today. we have some plans to implement full SMJ support soon https://github.com/oap-project/native-sql-engine/issues/95