oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 77 forks source link

Return empty value when select count(*) from empty table with extra RePartition after it #1205

Open jackylee-ch opened 1 year ago

jackylee-ch commented 1 year ago

Describe the bug When there is a ShuffleExchange after HashAgg, whose agg func is Count, and there are no inputs passed to HashAgg, gazelle will return empty batch rather than return none empty batch with 0.

To Reproduce

spark.sql("select 1 as a").filter("a > 1").groupBy().count().repartition(10).explain(true)
jackylee-ch commented 1 year ago

The main reason for this problem is the iterator defined in ColumnarHashAggregateExec is invalid. Its hasNext would return different value if we called it twice without calling next func. And in ColumnarShuffleWriteExec, we would check hasNext twice before calling next.