oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 76 forks source link

Buffer overflow while using hash agg #1127

Open jackylee-ch opened 2 years ago

jackylee-ch commented 2 years ago

Describe the bug We meet a core dump when running the sql with hash agg. In this sql, the agg key is constant and one of the selected columns has Chinese value in it. After deep in it, we found that there is a buffer overflow problem when calling GetOrInsert. Exception:

Capacity error: array cannot contain more than 2147483646 bytes, have 2147483708

To Reproduce The data of chinese_col needs to be all different.

select chinese_col, col2, col3, sum(col4) from table group by 1,2,3,4

Expected behavior No exception or coredump thrown.

jackylee-ch commented 2 years ago

Using LargeStringHashMap instead of StringHashMap, which actually uses LargeBinaryBuilder to append string values, works with me for this problem.