substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.17k stars 150 forks source link

feat: add field-id resolution to parquet reads #532

Closed westonpace closed 4 months ago

westonpace commented 1 year ago

This PR clarifies how field resolution should happen when the base schema does not perfectly match the file schema (this is a very common case in some environments where there is no catalog or the catalog does not include the full parquet schema).

This should not be a breaking change as all existing implementations are operating by the default (name-based resolution) as far as I know.

westonpace commented 1 year ago

I am leaving this in a draft PR until I have prototyped support for this in Acero to confirm this should work.