Open raunaqmorarka opened 1 year ago
Is this same as or related to @findinpath 's https://github.com/trinodb/trino/issues/17156?
Is this same as or related to @findinpath 's #17156?
I'm assuming that #17156 is iceberg specific where iceberg logic needs to be fixed for parquet (hive+parquet works as expected). In this issue I'm referring to orc reader problem which affects both hive and iceberg connectors. We could close this one if it's less confusing to track both problems in #17156
Orc reader is currently relying on lazy loading of blocks to avoid decoding unreferenced struct fields. But it's still using all fields in the struct when populating structures to plan reads from orc file. This can lead to over reading from file system due to the merging of nearby small reads in the file into larger reads. Parquet reader avoids this by dropping all the unreferenced fields of struct when planning IO. Orc reader can be improved to do the same.
fyi @findepi @findinpath @dain