substrait-io / substrait-java

Apache License 2.0
72 stars 70 forks source link

fix: set VirtualTableScan schema explicitly #272

Closed Blizzara closed 2 weeks ago

Blizzara commented 2 weeks ago

VirtualTableScan in Substrait is expected to contain both a NamedSchema (field names in dfs form and types) like any other read rel, plus the actual data rows. However substrait-java was ignoring the types of the NamedSchema, instead grabbing type info from the first row of the data. This however is not always sufficient since the data can contain varying nullability row-by-row.

This also just simplifies VirtualTableScan and makes it behave more like one would expect by looking at the Substrait spec as well as other ReadRels