oap-project / gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Apache License 2.0
256 stars 77 forks source link

[NSE-1171] Support merge parquet schema and read missing schema #1175

Closed jackylee-ch closed 1 year ago

jackylee-ch commented 1 year ago

What changes were proposed in this pull request?

This pr is trying to support Parquet Schema merge in ArrowFileFormat.infer_schema and support dealing with missing column or filter in Parquet reading.

How was this patch tested?

unit tests.

github-actions[bot] commented 1 year ago

https://github.com/oap-project/native-sql-engine/issues/1171

jackylee-ch commented 1 year ago

This PR could be tested in Filter applied on merged Parquet schema with new column should work with #1162 .

jackylee-ch commented 1 year ago

cc @zhouyuan @PHILO-HE

zhouyuan commented 1 year ago

@jackylee-ch could you please also add a small Scala unit test for this feature?

jackylee-ch commented 1 year ago

@jackylee-ch could you please also add a small Scala unit test for this feature?

Sure