open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.55k stars 1.05k forks source link

Iceberg Metadata Ingestion failing due to S3 FileSystem Initialisation #18512

Closed Prajwal214 closed 1 week ago

Prajwal214 commented 1 week ago

Affected module Does it impact the UI, backend or Ingestion Framework? -- Ingestion

Describe the bug When ingesting Iceberg table metadata in OpenMetadata version 1.5.10, the ingestion process encounters a TypeError indicating an issue with the S3FileSystem initialization. Specifically, the error shows that expected bytes, pydantic_core._pydantic_core.Url found. This appears to be related to the S3 file system implementation that requires a fix.

To Reproduce

Screenshots or steps to reproduce

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/iceberg/metadata.py", line 183, in get_tables_name_and_type 
    table = self.iceberg.load_table(table_identifier)
  File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/hive.py", line 358, in load_table 
    return self._convert_hive_into_iceberg(hive_table, io)
  File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/catalog/hive.py", line 239, in _convert_hive_into_iceberg 
    file = io.new_input(metadata_location)
  File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/io/pyarrow.py", line 369, in new_input 
    fs=self.fs_by_scheme(scheme),
  File "/home/airflow/.local/lib/python3.10/site-packages/pyiceberg/io/pyarrow.py", line 319, in _initialize_fs 
    return S3FileSystem(**client_kwargs)
  File "pyarrow/_s3fs.pyx", line 356, in pyarrow._s3fs.S3FileSystem.__init__ 
  File "<stringsource>", line 15, in string.from_py.__pyx_convert_string_from_py_6libcpp_6string_std__in_string 
TypeError: expected bytes, pydantic_core._pydantic_core.Url found

Expected behavior A clear and concise description of what you expected to happen. --The ingestion process should complete successfully, loading metadata for Iceberg tables without raising errors related to S3FileSystem.

Version:

Additional context Add any other context about the problem here.