w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 46 forks source link

On dataset series > sharing a usecase experience #1395

Open bertvannuffelen opened 3 years ago

bertvannuffelen commented 3 years ago

In https://w3c.github.io/dxwg/dcat/#dataset-series-properties some good guidelines are shared about the intended usage of dataset series.

In the past I followed statistical dataset series from Eurostat. From one ESTAT dataset series I found the UK version, and then I discovered that this series was splitted over 2 organisations. The most recent 5 years in at their stat office, but the older ones where located at the archive. I found this only via freetext search, because over the years the titles sligtly changed.

My question/remark is: multi-organisational dataset series would that be adviced?

dr-shorthair commented 3 years ago

If the intention is that it is a single series, then it should be classified that way.

However, there is this potential inconsistency with the definition of Dataset "A collection of data, published or curated by a single agent", since DatasetSeries is sub-class of Dataset. Hmmm.

makxdekkers commented 3 years ago

@bertvannuffelen @dr-shorthair The way I see this is that you really have two separate series. One that always has 5 members -- if I understood the situation correctly -- where one dataset, the latest year, is added and one, the sixth year, is (re)moved every year, and one series that grows every year as it adds the latest year.

Also, the way that @bertvannuffelen found what I would call the second, historical series, was through freetext search -- does that mean that the two series weren't described in one record/landing page?

The way I would describe it is as two series, one under the responsibility of the stat office, and one under the responsibility of the archive -- probably with different contact information -- and then link the two series with some properties to express the temporal relationship like "previous version" and "has current version".

Otherwise, if you describe it as one series, which would you list as publisher and how would you express the temporal coverage of the two parts in such a way that you know which property is about which part?

agreiner commented 3 years ago

I think this issue is a valid critique of the idea of defining a DatasetSeries as a subclass of Dataset. Datasets in a series share common characteristics, but they always differ in some substantial way, or they would be duplicates or versions of each other. Defining a series in a different way may allow DCAT 3 to make DatasetSeries more useful and less confusing. For the sake of findability, I think we should favor lumping over splitting.

bertvannuffelen commented 3 years ago

Maybe my experience is maybe to outcome of the activity "archiving digital artifacts". The example shows both active maintained datasets but also the archiving of those datasets, and that those are connected.