Ingest glue data - Githubissues

ministryofjustice / find-moj-data

Find MOJ data service • This repository is defined and managed in Terraform

MIT License

5 stars 0 forks source link

As a user of the catalogue I want to see all data from the MoJ So that I can use it for analysis

A previous spike #921 uncovered some metadata in glue which is not listed under CaDeT. We want to ingest these metadata to the catalogue. We will need:

A glue ingestion recipe listing the databases we want to ingest
- familyman_live_v4
- mags_curated_v3
- sop_preprocessed
- sop_base
- sop_transformed_v1_ac
- contracts_rio_v1
- contracts_jaggaer_v1
A tagger which tags these tables and the databases for the catalogue - dc_display_in_catalogue
Sort out the domain assignment of the ingested entities.
A custom ingestion which adds to the glue ingestion to make sure the glue tables look like the CaDeT tables in Find MoJ data
GH Actions workflows to schedule the glue ingestion (this glue data is stable and so the frequency of ingestion could be weekly)

source: type: glue config: aws_region: "eu-west-1" database_pattern: allow: [ "familyman_live_v4$", "mags_curated_v3$", "sop_preprocessed$", "sop_base$", "sop_transformed_v1_ac$", "contracts_rio_v1$", "contracts_jaggaer_v1$", ] transformers: - type: "simple_add_dataset_tags" config: tag_urns: - "urn:li:tag:dc_display_in_catalogue" # - type: "simple_add_dataset_domain" # config: # semantics: OVERWRITE # domains: # - urn:li:domain:Courts

ministryofjustice / find-moj-data

Ingest glue data #954