prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.08k stars 5.39k forks source link

Include HiveColumnIndex in HiveColumnHandle to Velox #23130

Closed yingsu00 closed 4 months ago

yingsu00 commented 4 months ago

Currently, the Velox HiveDataSource matches the column name from the file (fileType) with the requested schema name. THese two names could be different. For example, Presto Iceberg writer changes the space to "_x20". To solve this problem, Presto Parquet reader has a session property "parquet_use_column_names" and default it to false. When it's set to false, the hiveColumnIndex in HiveColumnHandle is used to map the schema column name to the actual column name in the file. However this field is not sent to Velox. To fix the problem, we will need to send this field to Velox.

The same needs to be done on IcebergColumnHandle's columnIdentity.id

Expected Behavior or Use Case

Presto Component, Service, or Connector

Hive and Iceberg connector

Possible Implementation

Change the presto_cpp/main/types/PrestoToVeloxConnector.cpp to add these fields

Example Screenshots (if appropriate):

Context

yingsu00 commented 4 months ago

Closing this as it's shown not needed : https://github.com/facebookincubator/velox/issues/10085