Open win-shy opened 10 months ago
@win-shy @SumanMaharana @harshach
Per my observation. The Hive Ingestion Dialect for metastore (assuming that your metastore is in PostgreSQL) the existing method "_get_table_columns" is not querying partitions.
Link to above method just FYI: https://github.com/open-metadata/OpenMetadata/blob/1.4.5-release/ingestion/src/metadata/ingestion/source/database/hive/metastore_dialects/postgres/dialect.py
And the query is existing in latest releases along with the older one 1.3.1 (which I'm using)
If the query in the method is modified to something like following, the method can retrieve the partitions as well as the table fields in single shot:
select db_name, table_name, column_name from ( SELECT d."NAME" as db_name, t."TBL_NAME" as table_name, c."COLUMN_NAME" as column_name, c."INTEGER_IDX" as idx FROM "DBS" d JOIN "TBLS" t on t."DB_ID" = d."DB_ID" JOIN "SDS" s on t."SD_ID" = s."SD_ID" JOIN "COLUMNS_V2" c on c."CD_ID" = s."CD_ID" WHERE d."NAME" = '{schema}' and t."TBL_NAME" = '{table}' union SELECT d."NAME" as db_name, t."TBL_NAME" as table_name, k."PKEY_NAME" as column_name, 50000 + k."INTEGER_IDX" FROM "DBS" d JOIN "TBLS" t on t."DB_ID" = d."DB_ID" join "PARTITION_KEYS" k on t."TBL_ID" = k."TBL_ID" where d."NAME" = '{schema}' and t."TBL_NAME" = '{table}' ) x order by db_name, table_name, idx
Current Problem:
The reason for missing partition fields in the tables metadata is they are not existing as table columns in hive metastore.
Because it is the design and working nature of Hive which can identify partition fields and need not be defined as table fields.
As the partition is nothing but the directory location for Hive in the distributed file system to quickly lookup the specific data from a specific range instead of loading the whole data there by saving a lot of time and computation.
So when Openmetadata ingests tables from hive-metastore, the query it launches does not take into account the partition fields but only the table fields
The Actual Schema of the table:
Solution:
Query in get table column method may be modified to yield all fields with partitions such as below:
When I add a Hive Service pull the schemas of hive table. But The table Schema has no partition field of hive. for example:
The fields of year 、month and day are all not being displayed on the Schema Page.
I wonder if this is a bug or is it just designed so. Thanks