ministryofjustice / data-catalogue

Data catalogue • This repository is defined and managed in Terraform
MIT License
2 stars 0 forks source link

Create an OpenMetadata python SDK for creating database service topology #108

Closed PriyaBasker23 closed 5 months ago

PriyaBasker23 commented 1 year ago

User story

As a user i want to see my data product metadata in the open metadata catalogue

Value / Purpose

By having this class we can use this to create or update the data product , database and tables.

Hypothesis

If we have this functionality we will be easily able to create create or update metadata and align with journey of data product registration

Additional information

https://dsdmoj.atlassian.net/wiki/spaces/DataPlatform/pages/4535877633/OpenMetaData+Catalogue

Use python SDK + Open Metadata Api to ingest data and create database topology throughout the user journey

Checklist

Definition of Done

MatMoore commented 1 year ago

Spoke to Jacob W today, he has set up the data-platform repo as a trusted publisher, so we don't need to manage credentials.

MatMoore commented 1 year ago

Looks like the openmetadata SDK doesn't support python 3.11 so need to downgrade to 3.10

https://openmetadata.slack.com/archives/C02B6955S4S/p1695891451002239

MatMoore commented 1 year ago

For parity with

https://github.com/ministryofjustice/modernisation-platform-environments/blob/main/terraform/environments/data-platform/data-product-table-schema-json-schema/v1.0.0/moj_data_product_table_spec.json and https://github.com/ministryofjustice/modernisation-platform-environments/blob/main/terraform/environments/data-platform/data-product-metadata-json-schema/v1.1.0/moj_data_product_metadata_spec.json

I'll need to add in the following metadata -

Required at table level:

Required at database level:

Optional at database level:

We also have a bunch of generated fields that could be passed along in some way. But this is not implemented at the moment, so I reckon we should cut this from the scope of the ticket

MatMoore commented 1 year ago

Here's an initial mapping of values between data platform JSON schemas and OpenMetadata schemas

Entity Data platform name OpenMetadata name
Database N/A id
Database Name Name
Database "$service.$name" fullyQualifiedName
Database N/A Display Name
Database description description
Database tags tags
Database version version
Database updatedAt updatedAt
Database N/A updatedBy
Database N/A href
Database owner owner
Database fixed service
Database fixed serviceType
Database N/A location
Database N/A usageSummary
Database N/A changeDescription
Database N/A deleted (soft deletion)
Database retentionPeriod retentionPeriod
Database domain domain
Database email extension.email
Database dpiaRequired extension.dpiaRequired
Table N/A id
Table name name
Table N/A displayName
Table "$service.$db.$schema.$name" fullyQualifiedName
Table description description
Table (data product) version version
Table updatedAt updatedAt
Table N/A updatedBy
Table N/A href
Table "Regular" or maybe "Partitioned" tableType
Table columns columns
Table N/A tableConstraints
Table extraction timestamp??? tablePartition
Table (data product) owner owner
Table N/A location
Table tags tags
Table N/A usageSummary
Table N/A followers
Table ??? sampleData
Table N/A tableProfilerConfig
Table N/A profile
Table N/A testSuite
Table N/A (dbt) dataModel
Table N/A changeDescription
Table N/A deleted (soft deletion)
Table retentionPeriod retentionPeriod
Table ??? sourceUrl
Table domain domain
Table data product name dataProducts

Notes:

MatMoore commented 1 year ago

How to set up custom properties: https://docs.open-metadata.org/v1.1.x/how-to-guides/how-to-add-custom-property-to-an-entity

https://docs.open-metadata.org/swagger.html#tag/Metadata

I'm skipping this for now, because I think we want to wait until custom properties are supported as schema level

MatMoore commented 1 year ago

Example of using tags: https://github.com/open-metadata/openmetadata-demo/blob/main/example_apis.py#L220C1-L220C81

However, it doesn't work with arbitrary tags - you need to first create a classification, and then create the tags within that.

So if the user makes up a tag and adds it to their metadata, it will error when sending to OpenMetadata.

See https://catalogue.apps-tools.development.data-platform.service.justice.gov.uk/tags/ https://docs.open-metadata.org/swagger.html#operation/createClassification https://docs.open-metadata.org/swagger.html#operation/createTag

I'm not sure how we want to manage this yet - might be worth just ignoring tags for now.

MatMoore commented 1 year ago

Might need to add a query for fetching users by name, since we need to pass an ID in for owner and any other entityReference value, and I don't see ID exposed anywhere in the UI. But for now we can use "7804c127-d677-4900-82f9-83517e51bb94", which is the data platform labs user.