OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
We currently sync dbt tags into OM by creating a new classification DBTTags and adding all that info as tags inside.
What we need to figure out here is a way to directly link dbt tags into existing tags/tiers/glossaries in OpenMetadata. Example:
tables:
- name: DATA_TABLE
description: Data ,
columns:
- name: gross_revenue
description: column description
meta:
openmetadata:
# DO NOT create anything new in OM, just link to existing items
- type: GlossaryTerm
name: BusinessGlossary.GrossRevenue
- type: Classification
name: Tier.Tier1
We need to figure out how to link datamodels with relevant information like:
past executions & status
last time it was refreshed
We have 2 different topics here:
How to integrate dbt cloud metadata (schedule, runs, jobs, etc.) -> This can become a new Pipeline Service
How to figure out - when using dbt core - when each model was refreshed etc. (https://docs.getdbt.com/reference/artifacts/run-results-json) -> These are extra properties to add to a DataModel Entity, be it a dbt model, or DashboardDataModel: add lastRefreshed, executions (similar to Pipeline Entity executions)
Create this as a Pipeline, show status, link the last status in the table, use Incident manager to track these pipeline status
[x] Follow the approach as with any connector (UI vs. run externally with new step-wise components)
Backlog
[ ] P2 - For dbt cloud, the metadata required can be gathered using a metadata API from dbt cloud. Check if it is feasible to implement it instead of current approach of getting dbt artifacts.
[1.3.2] - JSON Schema & Parsing Improvements
type
fields in the json schema of each individual config of dbt (local, http, s3, gcs, azure, cloud)[1.4] - Tags & Glossaries
We currently sync dbt tags into OM by creating a new classification
DBTTags
and adding all that info as tags inside. What we need to figure out here is a way to directly linkdbt
tags into existing tags/tiers/glossaries in OpenMetadata. Example:[1.4.1] - dbt run details
We need to figure out how to link datamodels with relevant information like:
We have 2 different topics here:
DataModel
Entity, be it adbt
model, orDashboardDataModel
: addlastRefreshed
,executions
(similar toPipeline
Entity executions)Create this as a Pipeline, show status, link the last status in the table, use Incident manager to track these pipeline status
[1.6] - Semantic Layer
How to integrate GENERALLY "Semantic Layer" data, be it from dbt metrics/exposures, Tableau Metrics, etc.
GlossaryTerm
Entity that can be an SQL expression, Python code that computes the metrics, etc.[1.6] - dbt Hooks
[1.3.2] - Documentation
Backlog