open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.55k stars 1.05k forks source link

Data diff does not support snowflake service #17903

Closed sushi30 closed 1 month ago

sushi30 commented 1 month ago

Affected module Data quality

Describe the bug Running data diff test case against snowflake service results in an error like this:

[2024-09-17 15:55:53] ERROR    {metadata.TestSuite:tableDiff:105} - Unexpected error while running the table diff test: URI must specify 'schema'. Expected format: snowflake://<user>:<password>@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/data_diff/databases/_connect.py", line 49, in match_path
    arg = dsn_dict.pop(param)
KeyError: 'schema'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/metadata/data_quality/validations/table/sqlalchemy/tableDiff.py", line 89, in run_validation
    return self._run()
  File "/usr/local/lib/python3.10/site-packages/metadata/data_quality/validations/table/sqlalchemy/tableDiff.py", line 117, in _run
    result = self.get_column_diff()
  File "/usr/local/lib/python3.10/site-packages/metadata/data_quality/validations/table/sqlalchemy/tableDiff.py", line 313, in get_column_diff
    changed = self.get_incomparable_columns()
  File "/usr/local/lib/python3.10/site-packages/metadata/data_quality/validations/table/sqlalchemy/tableDiff.py", line 165, in get_incomparable_columns
    table1 = data_diff.connect_to_table(
  File "/usr/local/lib/python3.10/site-packages/data_diff/__init__.py", line 35, in connect_to_table
    db: Database = connect(db_info, thread_count=thread_count)
  File "/usr/local/lib/python3.10/site-packages/data_diff/databases/_connect.py", line 272, in __call__
    conn = self.connect_to_uri(db_conf, thread_count, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/data_diff/databases/_connect.py", line 175, in connect_to_uri
    kw = matcher.match_path(dsn)
  File "/usr/local/lib/python3.10/site-packages/data_diff/databases/_connect.py", line 52, in match_path
    raise ValueError(f"URI must specify '{param}'. Expected format: {help_str}")
ValueError: URI must specify 'schema'. Expected format: snowflake://<user>:<password>@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>

/usr/local/lib/python3.10/site-packages/metadata/utils/deprecation.py:25: DeprecationWarning: [print_status] will be deprecated in the release [1.6]: Use 'workflow.print_status()' instead.

To Reproduce

Screenshots or steps to reproduce

Expected behavior A clear and concise description of what you expected to happen.

Version:

Additional context Add any other context about the problem here.

sushi30 commented 1 month ago

While trying to fix discovered that passwords are not unquoted so its impossible to use passwords with special characters. Was fixed raised for postgres here (https://github.com/datafold/data-diff/issues/811) but needs to be resolved globally.

sushi30 commented 1 month ago

decoding passwords added https://github.com/open-metadata/collate-data-diff/commit/f5ac4c0df6c8cd258c6c281d125c0bd84b822783