ministryofjustice / data-catalogue

Data catalogue • This repository is defined and managed in Terraform
MIT License
1 stars 0 forks source link

Lineage from CaDeT #13

Open MatMoore opened 3 months ago

MatMoore commented 3 months ago

CaDeT can provide full lineage from source to derived table, but we don't have this working at the moment because we are only including a select list of tables.

This is controlled by the node_name_pattern which can have either an include list or an exclude list.

There is some low value tables that are created as part of the implementation of create_a_derived_table. Is there anyway we can exclude intermediate tables without losing the lineage either side of them?

Alternatively, can we ingest everything, but tag assets in such a way that we can filter out these intermediate tables from the search in find-moj-data?

Desired outcome of spike

MatMoore commented 3 months ago

Note: I briefly tried this node name pattern, but it seemed like it ruined the lineage

        node_name_pattern:
            deny:
                - '.*intermediate'
                - '.*_intm_'
                - '.*_joined'
                - '.*_filtered'
                - '.*_stg_'
                - '.*_sensitive_'
LavMatt commented 1 month ago

There is a meeting with DMET re. CaDeT 14:15 Thursday 09/05. If you pick this spike up and are not on the invite ask me and i'll forward

seanprivett commented 1 month ago

The CaDeT ingestion is planned to work from a flag/tag in the DBT data that indicates whether an asset should be catalogued. This will be decided and manually set by the CaDeT user.

seanprivett commented 1 month ago

Working assumption is that we will ingest everything, including the intermediate tables, but we only want to display the end products in the front end. How will this work?