oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 77 forks source link

Peers' values should be considered in window function for CURRENT ROW in range mode #1166

Closed PHILO-HE closed 1 year ago

PHILO-HE commented 1 year ago

Describe the bug In RANGE mode, CURRENT ROW refers to any peer row of the current row. Rows are peers if they have the same values for the ORDER BY fields. A frame start of CURRENT ROW refers to the first peer row of the current row, while a frame end of CURRENT ROW refers to the last peer row of the current row. If no ORDER BY is specified, all rows are considered peers of the current row.

The default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT is used in our supported sum window function with sorting. But we have not considered peers in the impl., i.e., we should sum up all peers' values instead of just summing up the value of current single row.

PHILO-HE commented 1 year ago

This issue can be easily reproduced by executing sql like sum(a) OVER (PARTITION BY b ORDER BY c), where column c has repeated values.

PHILO-HE commented 1 year ago

Fixed.