Open gshenoy123 opened 2 years ago
@gshenoy123 Just wanted to confirm, if you have your partition directory structure in this form on s3? /delta-path/{partition-columnname}={partition-value} (Ex - s3a://data-poc/data/sample/transactions/table/trx_data={val})
If not, could you please update it and try? And column names are stored in lowercase in metastore so make sure the partition name is in lowercase in s3 too.
@agrawalreetika thanks for the reply. Yes the format is as what you mentioned. The s3 table path and partitions listing are depicted as follows -
aws s3 ls s3://data-poc/data/sample/transactions/table
Output
pre TRX_DATA=2022-10-01/ pre TRX_DATA=2022-10-02/ pre TRX_DATA=2022-10-03/ ....
Regarding table creation in hive metastore, we created it using dummy column as per connector document (see reference below) and expected delta connector (DSR) to understand the table structure. We did the same and can see desc delta.default.trx_poc_1 does list down all table columns along with partition column - TRX_DATA.
We created table in Presto using DSR. > https://prestodb.io/docs/current/connector/deltalake.html
As mentioned in the documentation above, we have two catalogue in hive meta store -
hive - we create schema using catalog and schema prefix hive.default.trx_poc_1
delta - we query using catalog and schema prefix delta.default.trx_poc_1
Please let know if you require any other info. But we can try with lowercase, but not sure it will help, because of above stated reasons and please note only partition column value is NULL, other column values are read correct. Also note if this SQL is executed in spark we get column (date) value correctly.
@gshenoy123 You are right, metadata would be read via delta transaction logs while using presto-delta connector for the query. But I think you might be getting all the column names in lowercase while querying from Presto. So I think for data partition column name mapping is not happening. I have a delta partition table, which I am able to query from the presto-delta connector fine. If this doesn't help you can ping me on slack, happy to help to debug this further.
Hi,
We are encountering an issue where in delta table partition field value is returned as NULL by Presto query. This results in presto queries which are based on the partitioned fields, returning inconsistent results. We executed the query using Spark SQL and we can see the value of partition field "TRX_DATE" displayed properly.
Below are the Spark ingestion code and query (Spark SQL and Presto) snippets, would highly appreciate any pointers.
Spark Ingestion
Spark SQL
Presto
We created table in Presto using DSR - https://prestodb.io/docs/current/connector/deltalake.html
As mentioned in the documentation above, we have two catalogue in hive meta store -