Closed seanprivett closed 1 month ago
A thread in slack suggests using transformers to achieve this end, as the native dbt mappings don't have an add_domain
utility
https://datahubspace.slack.com/archives/CUMUWQU66/p1674149180727029
There's an active feature request for
I have this working with using the naming of tables to assign domains.
source:
type: dbt
config:
manifest_path: 's3://mojap-derived-tables/prod/run_artefacts/latest/target/manifest.json'
catalog_path: 's3://mojap-derived-tables/prod/run_artefacts/latest/target/catalog.json'
test_results_path: 's3://mojap-derived-tables/prod/run_artefacts/latest/target/run_results.json'
target_platform: athena
infer_dbt_schemas: true
aws_connection:
aws_region: eu-west-1
node_name_pattern:
allow:
- '.*bold_sm_spells.*'
- '.*common_platform.*'
- '.*sirius.*'
entities_enabled:
test_results: 'YES'
seeds: 'YES'
snapshots: 'YES'
models: 'YES'
sources: 'YES'
test_definitions: 'YES'
stateful_ingestion:
remove_stale_metadata: true
transformers:
- type: "pattern_add_dataset_domain"
config:
semantics: OVERWRITE
domain_pattern:
rules:
'urn:li:dataset:\(urn:li:dataPlatform:dbt,awsdatacatalog.*common_platform.*': ["HMCTS"]
'urn:li:dataset:\(urn:li:dataPlatform:dbt,awsdatacatalog.*prison.*': ["HMPPS"]
'urn:li:dataset:\(urn:li:dataPlatform:dbt,awsdatacatalog.*sirius.*': ["OPG"]
This recipe is included in https://github.com/ministryofjustice/data-catalogue/issues/123
Matt did a spike to pick up domains from CaDeT, ~from which we'd then map to our own domain model.~ https://github.com/ministryofjustice/find-moj-data/issues/108
We would like the domains in the CaDeT metadata to be assigned to actual DataHub domains, rather than custom properties
https://datahubproject.io/docs/generated/ingestion/sources/dbt/#dbt-meta-automated-mappings