Closed PriyaBasker23 closed 5 months ago
Spoke to Jacob W today, he has set up the data-platform repo as a trusted publisher, so we don't need to manage credentials.
Looks like the openmetadata SDK doesn't support python 3.11 so need to downgrade to 3.10
https://openmetadata.slack.com/archives/C02B6955S4S/p1695891451002239
For parity with
https://github.com/ministryofjustice/modernisation-platform-environments/blob/main/terraform/environments/data-platform/data-product-table-schema-json-schema/v1.0.0/moj_data_product_table_spec.json and https://github.com/ministryofjustice/modernisation-platform-environments/blob/main/terraform/environments/data-platform/data-product-metadata-json-schema/v1.1.0/moj_data_product_metadata_spec.json
I'll need to add in the following metadata -
Required at table level:
Required at database level:
Optional at database level:
We also have a bunch of generated fields that could be passed along in some way. But this is not implemented at the moment, so I reckon we should cut this from the scope of the ticket
Here's an initial mapping of values between data platform JSON schemas and OpenMetadata schemas
Entity | Data platform name | OpenMetadata name |
---|---|---|
Database | N/A | id |
Database | Name | Name |
Database | "$service.$name" | fullyQualifiedName |
Database | N/A | Display Name |
Database | description | description |
Database | tags | tags |
Database | version | version |
Database | updatedAt | updatedAt |
Database | N/A | updatedBy |
Database | N/A | href |
Database | owner | owner |
Database | fixed | service |
Database | fixed | serviceType |
Database | N/A | location |
Database | N/A | usageSummary |
Database | N/A | changeDescription |
Database | N/A | deleted (soft deletion) |
Database | retentionPeriod | retentionPeriod |
Database | domain | domain |
Database | extension.email | |
Database | dpiaRequired | extension.dpiaRequired |
Table | N/A | id |
Table | name | name |
Table | N/A | displayName |
Table | "$service.$db.$schema.$name" | fullyQualifiedName |
Table | description | description |
Table | (data product) version | version |
Table | updatedAt | updatedAt |
Table | N/A | updatedBy |
Table | N/A | href |
Table | "Regular" or maybe "Partitioned" | tableType |
Table | columns | columns |
Table | N/A | tableConstraints |
Table | extraction timestamp??? | tablePartition |
Table | (data product) owner | owner |
Table | N/A | location |
Table | tags | tags |
Table | N/A | usageSummary |
Table | N/A | followers |
Table | ??? | sampleData |
Table | N/A | tableProfilerConfig |
Table | N/A | profile |
Table | N/A | testSuite |
Table | N/A | (dbt) dataModel |
Table | N/A | changeDescription |
Table | N/A | deleted (soft deletion) |
Table | retentionPeriod | retentionPeriod |
Table | ??? | sourceUrl |
Table | domain | domain |
Table | data product name | dataProducts |
Notes:
How to set up custom properties: https://docs.open-metadata.org/v1.1.x/how-to-guides/how-to-add-custom-property-to-an-entity
https://docs.open-metadata.org/swagger.html#tag/Metadata
I'm skipping this for now, because I think we want to wait until custom properties are supported as schema level
Example of using tags: https://github.com/open-metadata/openmetadata-demo/blob/main/example_apis.py#L220C1-L220C81
However, it doesn't work with arbitrary tags - you need to first create a classification, and then create the tags within that.
So if the user makes up a tag and adds it to their metadata, it will error when sending to OpenMetadata.
See https://catalogue.apps-tools.development.data-platform.service.justice.gov.uk/tags/ https://docs.open-metadata.org/swagger.html#operation/createClassification https://docs.open-metadata.org/swagger.html#operation/createTag
I'm not sure how we want to manage this yet - might be worth just ignoring tags for now.
Might need to add a query for fetching users by name, since we need to pass an ID in for owner and any other entityReference value, and I don't see ID exposed anywhere in the UI. But for now we can use "7804c127-d677-4900-82f9-83517e51bb94", which is the data platform labs user.
User story
As a user i want to see my data product metadata in the open metadata catalogue
Value / Purpose
By having this class we can use this to create or update the data product , database and tables.
Hypothesis
If we have this functionality we will be easily able to create create or update metadata and align with journey of data product registration
Additional information
https://dsdmoj.atlassian.net/wiki/spaces/DataPlatform/pages/4535877633/OpenMetaData+Catalogue
Use python SDK + Open Metadata Api to ingest data and create database topology throughout the user journey
Checklist
Definition of Done