open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.13k stars 975 forks source link

Add support for profiler/DQ for Deltalake #8515

Open ayush-shah opened 1 year ago

ayush-shah commented 1 year ago

Add Support for Profiler and DQ for Deltalake

Related issues

ayush-shah commented 1 year ago

Line up on how to proceed here:

ischwart1 commented 1 year ago

@pmbrull Maybe related: in the docs it says dbt is not supported for Deltalake, but that dbt is supported for databricks.

if I e.g have a spark job in databricks that writes data to a delta table in spark df.write.format("delta")... and I then use this delta table as a source for a downstream dbt model, will I not be able to import metadata from this dbt model in OM?

pmbrull commented 1 year ago

hi @ischwart1 looks like an issue in the docs. Will fix it.

If you ingest some tables from deltalake, you can run the dbt workflow on top of the same service to add dbt model details

https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/deltaLakeConnection.json#L115