open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.55k stars 1.05k forks source link

Metadata ingestion fails for Iceberg tables with nested partition column #18491

Open tomasko-labuda opened 2 weeks ago

tomasko-labuda commented 2 weeks ago

Affected module Ingestion Framework

Describe the bug Metadata ingestion fails for Iceberg tables with nested partition column.

To Reproduce Data ingestion works for this table: CREATE TABLE catalog1.db1.table1 (a STRUCT<b: STRING>, b STRING) PARTITIONED BY (b)

Data ingestion fails for this table: CREATE TABLE catalog1.db1.table1 (a STRUCT<b: STRING>, b STRING) PARTITIONED BY (a.b)

Error:

[2024-10-31T13:58:37.779+0000] {status.py:91} WARNING - Failed to ingest CreateTableRequest [table1] due to api request failure: Invalid column name found in table partition
[2024-10-31T13:58:37.779+0000] {status.py:92} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 243, in _one_request
    resp.raise_for_status()
  File "/home/airflow/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://openmetadata-server:8585/api/v1/tables
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/sink/metadata_rest.py", line 146, in _run
    return self._run_dispatch(record)
  File "/usr/local/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/sink/metadata_rest.py", line 137, in _run_dispatch
    return self.write_create_request(record)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/sink/metadata_rest.py", line 167, in write_create_request
    created = self.metadata.create_or_update(entity_request)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/ometa_api.py", line 280, in create_or_update
    return self._create(data=data, method="put")
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/ometa_api.py", line 271, in _create
    resp = fn(self.get_suffix(entity), data=data.model_dump_json())
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/execution_time_tracker.py", line 195, in inner
    result = func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 324, in put
    return self._request("PUT", path, data)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 212, in _request
    return self._one_request(method, url, opts, retry)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/ometa/client.py", line 263, in _one_request
    raise APIError(error, http_error) from http_error
metadata.ingestion.ometa.client.APIError: Invalid column name found in table partition

Expected behavior Data ingestion works for table with nested partition column.

Version: