oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 77 forks source link

TPC-DS q16 query has no output results #1133

Open kelvin-qin opened 1 year ago

kelvin-qin commented 1 year ago

Describe the bug When I ran the TPC-DS benchmark test, I found that only 98 Fetched time could be retrieved in the sf500 case, but spark itself could get 99 Fetched time results. As a comparison, both vanilla spark and Gazelle can get all 99 results in the sf1000 case.

To Reproduce

  1. Generate data for sf500 in hive
  2. Create a new database and convert the metadata to arrow format
  3. Partition recovery based on the above database
  4. Run tpc-ds with the above database All 99 queries
  5. Check the log of the task running

Expected behavior All 99 Fetched results can be found in the log, and all queries explicitly reversed the output.

Additional context gazelle v1.4.0-inspur

kelvin-qin commented 1 year ago

The q16 task is a count statement, which does result in 0 and NULL in vanilla spark.

kelvin-qin commented 1 year ago

All 6 scan stages of q16 have been read into the file