open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.12k stars 974 forks source link

Unable to get lineage between Postgres and Snowflake using Fivetran connector #7635

Closed fredriv closed 1 year ago

fredriv commented 1 year ago

Affected module Backend and ingestion framework

Describe the bug I've ingested tables from Postgres and Snowflake into Open Metadata. Tables are synced between Postgres and Snowflake using Fivetran.

I'm trying to use the new Fivetran connector to set up lineage between the Postgres and Snowflake tables. The connector manages to find the Fivetran pipelines, but it fails to locate the correct Snowflake table in Open Metadata since the Snowflake tables use UPPER CASE for the IDs whereas Fivetran (Postgres?) seems to use lower case. So the connector tries to look up a Snowflake table based on FQN Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost which fails because it should be Warehouse_Dev.RAW_DEV.FISK_CLOUDSQL_PUBLIC.OST

To Reproduce

Expected behavior

I expected the Fivetran connector to find both the Postgres and Snowflake tables and set up lineage between them.

Version:

Additional context

Excerpt from the Fivetran connector debug log:

[2022-09-21, 13:35:05 UTC] {client.py:177} DEBUG - URL http://openmetadata-server:8585/api/v1/tables/name/fisk_cloudsql.fisk.public.ost, method GET
[2022-09-21, 13:35:05 UTC] {client.py:178} DEBUG - Data {'headers': {'Content-type': 'application/json', 'Authorization': '***'}, 'allow_redirects': False, 'params': None}
[2022-09-21, 13:35:05 UTC] {client.py:177} DEBUG - URL http://openmetadata-server:8585/api/v1/tables/name/Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost, method GET
[2022-09-21, 13:35:05 UTC] {client.py:178} DEBUG - Data {'headers': {'Content-type': 'application/json', 'Authorization': '***'}, 'allow_redirects': False, 'params': None}
[2022-09-21, 13:35:05 UTC] {ometa_api.py:546} DEBUG - Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/metadata/ingestion/ometa/client.py", line 201, in _one_request
    resp.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 1022, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://openmetadata-server:8585/api/v1/tables/name/Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/metadata/ingestion/ometa/ometa_api.py", line 539, in _get
    resp = self.client.get(f"{self.get_suffix(entity)}/{path}{fields_str}")
  File "/usr/local/lib/python3.9/site-packages/metadata/ingestion/ometa/client.py", line 232, in get
    return self._request("GET", path, data)
  File "/usr/local/lib/python3.9/site-packages/metadata/ingestion/ometa/client.py", line 179, in _request
    return self._one_request(method, url, opts, retry)
  File "/usr/local/lib/python3.9/site-packages/metadata/ingestion/ometa/client.py", line 209, in _one_request
    raise APIError(error, http_error) from http_error
metadata.ingestion.ometa.client.APIError: table instance for Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost not found

[2022-09-21, 13:35:05 UTC] {ometa_api.py:547} WARNING - GET Table for name/Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost.Error 404 - table instance for Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost not found
[2022-09-21, 13:35:05 UTC] {fivetran.py:171} INFO - Lineage Skipped for fisk_cloudsql.fisk.public.ost - Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost
harshach commented 1 year ago

@fredriv @ulixius9 we are applying lowercase normalizer in ES to avoid this issue


        "type": "keyword",
        "normalizer": "lowercase_normalizer"
      },```
ulixius9 commented 1 year ago

@harshach I believe we are directly calling the backend api instead of querying to ES first here

fredriv commented 1 year ago

I get a 404 when trying to look up the table with the lowercase ID:

curl -i http://localhost:8585/api/v1/tables/name/Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost
HTTP/1.1 404 Not Found
Date: Thu, 22 Sep 2022 07:50:19 GMT
Content-Type: application/json
Content-Length: 100

{"code":404,"message":"table instance for Warehouse_Dev.raw_dev.fisk_cloudsql_public.ost not found"}

But it works when using the uppercase ID:

curl -I http://localhost:8585/api/v1/tables/name/Warehouse_Dev.RAW_DEV.FISK_CLOUDSQL_PUBLIC.OST
HTTP/1.1 200 OK
Date: Thu, 22 Sep 2022 07:53:06 GMT
Content-Type: application/json
Content-Length: 2316
pmbrull commented 1 year ago

@harshach I believe we are directly calling the backend api instead of querying to ES first here

@ulixius9 ES gets called in fqn.build to get the FQN name. We then call the API directly, but we should have been able to find the entity through ES first

pmbrull commented 1 year ago

related to https://github.com/open-metadata/OpenMetadata/issues/7690

nahuelverdugo commented 1 year ago

I confirmed with the user that this is happening using PostgreSQL as DB for the OM server.

nahuelverdugo commented 1 year ago

https://github.com/open-metadata/OpenMetadata/pull/9079 has fixed this.

@fredriv, please, reopen the issue if the error still happens on 0.13.1.