oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 77 forks source link

[NSE-1205] Fix: change the logic of iterator in ColumnarHashAggregateExec #1206

Open jackylee-ch opened 1 year ago

jackylee-ch commented 1 year ago

What changes were proposed in this pull request?

When there is a ShuffleExchange after HashAgg, whose agg func is Count, and there are no inputs passed to HashAgg, gazelle will return empty batch rather than return none empty batch with 0. The main reason for this problem is the iterator defined in ColumnarHashAggregateExec is invalid. Its hasNext would return different value if we called it twice without calling next func. And in ColumnarShuffleWriteExec, we would check hasNext twice before calling next.

How was this patch tested?

unit tests.

github-actions[bot] commented 1 year ago

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/native-sql-engine/issues

Then could you also rename commit message and pull request title in the following format?

[NSE-${ISSUES_ID}] ${detailed message}

See also:

github-actions[bot] commented 1 year ago

https://github.com/oap-project/native-sql-engine/issues/1205

zhouyuan commented 1 year ago

I tried to run the unit tests locally and find below tests failed:

com.intel.oap.tpc.ds.Orc_TPCDSSuite.smj query 3
com.intel.oap.tpc.ds.Orc_TPCDSSuite.q95 - shj
com.intel.oap.tpc.ds.TPCDSSuite.smj query 3
com.intel.oap.tpc.ds.TPCDSSuite.q95 - shj

-yuan