w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
139 stars 55 forks source link

How to Catalog Data Duplication Settings per Dataset #1589

Open Jonessmj opened 4 months ago

Jonessmj commented 4 months ago

Dear DCAT team,

I have a question on how to properly use DCAT to capture metadata about instructions for copying a dataset. The instructions/configurations are per dataset. The scenario is that there is a dataset in a data catalog that could be copied to an AWS Redshift Cluster. It hasn't been copied yet, but if certain application-level things happen then a service will copy the data to one or many AWS Redshift Clusters. Prior to this happening though, the owner of the dataset will specify default DIST and SORT configurations to be used for the duplicated dataset.

Since these parameters/configurations are being set per source dataset and the duplicated dataset doesn't exist when these parameters/configurations are being defined I was thinking that it should be a property of the source dataset, but I'm not sure what dcat terms or extensions of dcat I should use. Alternatively, should these settings be some first class entity of their own with a prov relationship to the source dataset?

dr-shorthair commented 4 months ago

This looks like a specialization for a particular application. SO probably out of scope for DCAT per se.

You can propose an extension, else develop your own application profile with additional elements connected to the standard DCAT base.