w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
152 stars 47 forks source link

Unclear classification of dataset/resources #1187

Closed aidig closed 3 years ago

aidig commented 4 years ago

Problem statement:

The use of the recommended controlled vocabularies for the classification of dataset/resources is somewhat unclear

Description

Section 5.5 Classifying dataset types states the following:

The type or genre of a dataset can be indicated using the dct:type property. It is recommended that the value of the property is taken from a well governed and broadly recognised set of resource types, such as the DCMI Type Vocabulary [DCTERMS], the MARC Genre/Terms Scheme, the [ISO-19115-1] MD_Scope codes, the DataCite resource types, or the PARSE.Insight content-types from Re3data [RE3DATA-SCHEMA].

The property dct:type on dcat:Dataset is inherited from dcat:Resource but since the majority of the recommended controlled vocabularies contain the indiviual 'dataset' this could practically present a number of implementation issues when applied to dcat:Dataset (as stated in section 5.5) without further extension, as the provided codelists are for classifying resourcer on a more general level - datasets being one of the main types....

Here a quick stab at a general mapping between the different resource types (with DCMI as the foundation mapped to selected elements from MD ScopeCodes and DataCite Resource Types)

DCMI type vocabulary ISO19115 MD_ScopeCode (excerpt) DataCite ResourceType (excerpt)
Dataset Dataset Dataset
Text (Document)? Text
Sound - see PresentationFormCode Sound/Audiovisual data?
Image - see PresentationFormCode Image
Event - Event
InteractiveResource - Interactive ressource
Service Service Service
Software          Software Software
PhysicalObject Feature? Physical Object
Collection Collection Collection
StillImage - Type of image?
MovingImage - Type of image?

Related GitHub issue: https://github.com/w3c/dxwg/issues/1186 Related Use Cases from "W3C Editors Draft" 5.8 Scope or type of dataset with a DCAT description [ID8] https://w3c.github.io/dxwg/ucr/#ID8

5.20 Modelling resources different from datasets [ID20] https://w3c.github.io/dxwg/ucr/#ID20

Proposal:

Might section 5.5 simply need reformulating eg:

5.5 Classifying resources in a catalog " The type or genre of a resource (dcat:Resource) can be indicated using the dct:type property."

Also, further advice on the scope of dcat:Dataset might still be required. (Also, check for formal definition/scope note consistency)

andrea-perego commented 3 years ago

@aidig , sincere apologies for not having replied to your feedback.

As the DCAT specification underwent a number of changes, I would like to check with you whether you think this issue is still valid.

At least one of your points (i.e., the use of dcterms:type with dcat:Resource) seemed to be addressed by the usage note in §6.4 Class: Cataloged Resource - quoting:

The class of all cataloged resources, the super-class of dcat:Dataset, dcat:DataService, dcat:Catalog and any other member of a dcat:Catalog. This class carries properties common to all cataloged resources, including datasets and data services. It is strongly recommended to use a more specific sub-class. When describing a resource which is not a dcat:Dataset or dcat:DataService, it is recommended to create a suitable sub-class of dcat:Resource, or use dcat:Resource with the dcterms:type property to indicate the specific type.

About the general issue of what should be considered a dataset, I think the discussion moved to https://github.com/w3c/dxwg/issues/1195

Please let us know if anything raised in this issue is yet to be addressed and/or not dealt with in a separate issue.

andrea-perego commented 3 years ago

Please let us know if anything raised in this issue is yet to be addressed and/or not dealt with in a separate issue.

@aidig , as we didn't get any further feedback, we are closing this issue.