opendata-swiss / dcat_ap_ch

Examples for geocat and DCAT data-catalogs are given here
5 stars 3 forks source link

dct:modified: Qualify which actions count as actual changes #179

Open Juan-Juan-1 opened 2 years ago

Juan-Juan-1 commented 2 years ago

Properties: dct:modified Classes: Dataset, Distribution Ist: The actual guidance about when should a new value for dct:modified should be set is insufficient. https://dcat-ap.ch/releases/2.0/dcat-ap-ch.html#dataset-modification-date Soll: The eCH-Working Group defines clearer criteria which defines which actions should be counted as a change to the dataset or to the resource.

l00mi commented 2 years ago

I propose to have the simple distinction between: dct:issued: First time a dataset is published. dct:modified: Each modification without changing the overall structure. (Only adding new data points.)

Juan-Juan-1 commented 2 years ago

Hi @l00mi, thank you for your input! dct:modified: Each modification without changing the overall structure. (Only adding new data points.) --> sounds like a good criteria to me! I'd be interested to hear if everybody feels it's clear enough...?

"dct:issued: First time a dataset is published." I feel that also there we would be the need of a bit more guidance. What does "published" mean? First time on the portal? First time online? First online even if not as open data?

sabinem commented 2 years ago

@l00mi, @Juan-Juan-1 As far as I know this property is tricky for some publishers, also for the BFS. Some datasets are issued once, then when a new version comes out: issued is set to release date of the newer version and modified is never set. So I think to define an appropriate usage for dct:modified and dct:issued, dct:hasVersion also needs to be added to DCAT-AP CH and the convention note should also include this additional property:

So in the above use case, one could say that the old dataset has to be kept and there should be a link from the new to the old dataset with dct:hasVersion. On the other hand, this rule might be too restrictive and the question is whether the older version of the dataset is of any interest any more after the new version has been released.

"Each modification without changing the overall structure. "

this is exaclty what is unclear. And I think we might need to accept that the exact usage of the dataset properties cannot be completely regulated.

l00mi commented 2 years ago

Overall I would propose to accommodate more the perspective of the data consumers than the data providers. After all the overall effort of the data descriptions are for the consumers.

While we always will accept that nothing here can be completely regulated, I propose to have the guideline formulated as clearly as possible, with potential bail-out possibilities.

(Btw. I always have in mind the use-case of automated ingestion of such datasets for projects, and here such clear markers of change in structure are crucial.)

metaodi commented 2 weeks ago

I think on a dataset level the rules are more or less clear:

dct:issued: when it was first published (on any portal, website etc.), and if unknown the first time it was published in the current portal dct:modified: the date of the last change (if any) of the dataset OR the distributions (no matter if it's the metadata or the actual data)

I understand, that this is a bit more complicated on the distribution level, but I would keep it simple: if there was a change, update the data (no matter if it's the data itself or the metadata). As long as the structure of the data is still the same, it's safe for a consumer to update the data.

However if the structure was changed, it's important to create a new distribution with a new ID to indicate this change. Maybe an implementation note could be added to make this distinction clear.