ministryofjustice / find-moj-data

Find MOJ data service • This repository is defined and managed in Terraform
MIT License
5 stars 0 forks source link

Ingest glue data #954

Open murdo-moj opened 1 week ago

murdo-moj commented 1 week ago

As a user of the catalogue I want to see all data from the MoJ So that I can use it for analysis

A previous spike #921 uncovered some metadata in glue which is not listed under CaDeT. We want to ingest these metadata to the catalogue. We will need:

murdo-moj commented 11 hours ago

Here's a recipe to start off

source:
  type: glue
  config:
    aws_region: "eu-west-1"
    database_pattern:
      allow: [
        "familyman_live_v4$",
        "mags_curated_v3$",
        "sop_preprocessed$",
        "sop_base$",
        "sop_transformed_v1_ac$",
        "contracts_rio_v1$",
        "contracts_jaggaer_v1$",
      ]

transformers:
  - type: "simple_add_dataset_tags"
    config:
      tag_urns:
        - "urn:li:tag:dc_display_in_catalogue"
  # - type: "simple_add_dataset_domain"
  #   config:
  #     semantics: OVERWRITE
  #     domains:
  #       - urn:li:domain:Courts