kwai / blaze

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Apache License 2.0
968 stars 90 forks source link

ORC file format supporting #498

Open richox opened 1 week ago

richox commented 1 week ago

Is your feature request related to a problem? Please describe. blaze supports only parquet format at this moment. other formats are fallback with row2column operators. orc is a columnar format widely used in production environment, there is a datafusion-orc project which provides OrcExec that are similar to datafusion's ParquetExec.

Describe the solution you'd like add datafusion-orc supports. transform FileSourceScanExec with orc format to native OrcExec.