odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
786 stars 258 forks source link

[Enhancement] Support for Data Products (in a Data Mesh) #6526

Open juergenhemelt opened 2 years ago

juergenhemelt commented 2 years ago

Is there an existing issue for this?

Please describe the new behavior that that will improve Egeria

The Egeria metamodel should include items for the description of Data Products in the context of a Data Mesh (https://martinfowler.com/articles/data-mesh-principles.html). There are existing developments and suggestions of how to do that. You can find some ideas here:

https://arnerossmann.github.io/post/2022-02-09_metadata-dataproduct/ https://github.com/agile-lab-dev/Data-Product-Specification

Alternatives

Using OpenMetadata (https://docs.open-metadata.org) instead of Egeria as suggested here https://github.com/agile-lab-dev/Data-Product-Specification

Any Further Information?

No response

Would you be prepared to be assigned this issue to work on?

davidradl commented 2 years ago

@mandy-chessell @planetf1 fyi

mandy-chessell commented 2 years ago

I am not sure how this is progressing but here are some thoughts ...

There are many description of data products made by different vendors and thought leaders. Some are focused on the technical implementation/deployment, others are more focused on the organizational/governance aspects of service level agreements/licensing/ownership aspects.

Each of these perspectives may be a valid focus for an organization at a particular point in time. Therefore I would propose that the data product is represented as a DataProduct classification that can be attached to any referenceable. This means it could be attached to a data set/API type asset, a server/container deployment or may be a more architectural/business construct that is attached to a solution component or digital service.

Over time as an organization refines their definition of a data product, the classification could be moved to a higher level concept to cover a more complete definiton of the data product.

I have just updated the descriptions of digital services, information supply chains and solution component in the Area 7 types description since that are relevant for the more complete view of a data product.

https://egeria-project.org/types/7/

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

mandy-chessell commented 1 year ago

Here are suggested mappings from data product concepts to Egeria's open metadata types:

Data Product concept Egeria open metadata types (with links)
Data Domain Data domains are represented by SubjectAreaDefinition entities. The SubjectArea classification is used to tag elements from the subject area.
Data Product Manager The data product manager role is typed by the DigitalServiceManager. They have the business ownership of a collection of related data products represented by a DigitalService. Data products are grouped under a single digital service when they make use of similar processing. For example, they may use the same data, but formated, scoped or processed differently with different licenses.
Data Product Each data product is identified by the DigitalProduct classification. The productType attribute can be used to identify the digital product as a data product.
Data Product Design The design of the data products' manufacturing and maintenance pipelines, along with the data products' storage and delivery mechanisms are represented by the digital service's SolutionBlueprint linked to SolutionComponents. The DigitalProduct classification is added to the solution components that represent the data product delivery capability.
Data Product Implementation The manufacturing/maintenance solution components are linked to the appropriate data pipeline Processes using the ImplementedBy relationship. The data product's delivery solution components are also linked to the delivery data assets via the ImplementedBy relationship.
Data Product Specification There are many types of information that make up the data product specification. Different organizations will make there own choices, but here are some options. They can be linked to the solution components or data assets depending on how specific the information is:
  • The schema of the data product, RootSchemaType, is attached to the data asset via the AssetSchemaType relationship.
  • The solution components, assets and data fields can be tagged using glossary terms, search keywords, security tags, reference data tags etc to make then easy to find and to explain what they contain.
  • The data products can be linked to a LicenseType using the License relationship. Terms and Conditions can be added to the LicenseType using the AttachedTermsAndConditions relationship.
  • Data profiling information can be attached to the assets as a DiscoveryAnalysisReport using the AssetDiscoveryReport relationship.
  • The ServiceLevelObjectives can be attached to the solution components or data assets using the GovernedBy relationship.
  • CertificationTypes can describe quality gates. They are attached to the data assets using the Certification relationship when the asset passes the quality tests.
  • DataProcessingPurposes can be attached to the solution components or data assets using the ApprovedDataPurposes relationships to show how the data in the data product can be processed.
  • A Connection is added to each data asset to identifiy the connector used to retrieve the data.
Data Product Subscription A subscriber (person, organization, system, ...) can register with the marketplace using a DigitalSubscription. The different products selected by the subscriber are attached to the digital subscription via the AgreementItem relationship. Terms and Conditions can be added to the DigitalSubscription using the AttachedTermsAndConditions relationship. Overrides to the terms and conditions can be added to the AgreementItem relationship.