ministryofjustice / data-catalogue

Data catalogue • This repository is defined and managed in Terraform
MIT License
1 stars 0 forks source link

Update CaDeT ingestion to process new metadata #145

Closed MatMoore closed 4 days ago

MatMoore commented 3 weeks ago

Implement the approach from this spike https://github.com/ministryofjustice/find-moj-data/issues/363#issuecomment-2145169079

What to do

    strip_user_ids_from_email: true
    tag_prefix: ""
    meta_mapping:
      owner_email:
        match: '.*'
        operation: 'add_owner'
        config:
          owner_type: user
          owner_category: DATAOWNER

Similar ticket to actually handle the new metadata fields: https://github.com/ministryofjustice/find-moj-data/issues/404

Acceptance criteria

murdo-moj commented 3 weeks ago

https://github.com/ministryofjustice/data-catalogue/pull/150

murdo-moj commented 3 weeks ago

I had to make a quick correction to some data types in create a derived table which were causing the import job to fail https://github.com/moj-analytical-services/create-a-derived-table/pull/1579

murdo-moj commented 3 weeks ago

Still debugging changes work by fixing them in create a derived table. If this run fails I'll set something up locally to debug. The errors are coming from newer tables though and we know there are some which aren't deployed so that could be what's causing the syntax errors. https://github.com/moj-analytical-services/create-a-derived-table/pull/1585

murdo-moj commented 2 weeks ago
    node_name_pattern:
      # These tables are currently badly formatted in the manifest. The fix for it should
      # go through when the dbt_docs workflow is working
      deny:
        - ".*use_of_force\\.summary_status_complete_dim.*"
        - ".*use_of_force\\.summary_status_in_progress_dim.*"
        - ".*use_of_force\\.summary_status_submitted_dim.*"
        - ".*delius_historic_imputed.*"
        - ".*delius_historic_unmodified.*"
mitchdawson1982 commented 1 week ago

Awaiting meeting with data engineering to discuss the addition of the catalogue tags in the dbt files.

LavMatt commented 1 week ago

Meeting arranged for Mon 24/06 with 3 of the lead analytics engineers (aka data modelling) to discuss the content of the PR we've done to collect some extra metadata for cadet tables - mainly whether to display in the catalogue, owner info and slack channel info.