Currently, the Velox HiveDataSource matches the column name from the file (fileType) with the requested schema name. THese two names could be different. For example, Presto Iceberg writer changes the space to "_x20". To solve this problem, Presto Parquet reader has a session property "parquet_use_column_names" and default it to false. When it's set to false, the hiveColumnIndex in HiveColumnHandle is used to map the schema column name to the actual column name in the file. However this field is not sent to Velox. To fix the problem, we will need to send this field to Velox.
The same needs to be done on IcebergColumnHandle's columnIdentity.id
Expected Behavior or Use Case
Presto Component, Service, or Connector
Hive and Iceberg connector
Possible Implementation
Change the presto_cpp/main/types/PrestoToVeloxConnector.cpp to add these fields
Currently, the Velox HiveDataSource matches the column name from the file (fileType) with the requested schema name. THese two names could be different. For example, Presto Iceberg writer changes the space to "_x20". To solve this problem, Presto Parquet reader has a session property "parquet_use_column_names" and default it to false. When it's set to false, the hiveColumnIndex in HiveColumnHandle is used to map the schema column name to the actual column name in the file. However this field is not sent to Velox. To fix the problem, we will need to send this field to Velox.
The same needs to be done on IcebergColumnHandle's columnIdentity.id
Expected Behavior or Use Case
Presto Component, Service, or Connector
Hive and Iceberg connector
Possible Implementation
Change the presto_cpp/main/types/PrestoToVeloxConnector.cpp to add these fields
Example Screenshots (if appropriate):
Context