Open murdo-moj opened 1 week ago
Here's a recipe to start off
source:
type: glue
config:
aws_region: "eu-west-1"
database_pattern:
allow: [
"familyman_live_v4$",
"mags_curated_v3$",
"sop_preprocessed$",
"sop_base$",
"sop_transformed_v1_ac$",
"contracts_rio_v1$",
"contracts_jaggaer_v1$",
]
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:dc_display_in_catalogue"
# - type: "simple_add_dataset_domain"
# config:
# semantics: OVERWRITE
# domains:
# - urn:li:domain:Courts
As a user of the catalogue I want to see all data from the MoJ So that I can use it for analysis
A previous spike #921 uncovered some metadata in glue which is not listed under CaDeT. We want to ingest these metadata to the catalogue. We will need:
A glue ingestion recipe listing the databases we want to ingest
A tagger which tags these tables and the databases for the catalogue -
dc_display_in_catalogue
Sort out the domain assignment of the ingested entities.
A custom ingestion which adds to the glue ingestion to make sure the glue tables look like the CaDeT tables in Find MoJ data
GH Actions workflows to schedule the glue ingestion (this glue data is stable and so the frequency of ingestion could be weekly)