w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 46 forks source link

Dataset citation [RDSC] #61

Closed jpullmann closed 6 years ago

jpullmann commented 6 years ago

Dataset citation [RDSC]

Provide a way to specify information required for data citation (e.g., dataset authors, title, publication year, publisher, persistent identifier)


Related use cases: Requirements for data citation [ID10] 
makxdekkers commented 6 years ago

That information is already in the metadata. Is the requirement that there is a separate property that contains the text of a citation? A problem might be that there are several citation styles. Would it not be more sensible to derive the citation from the information already there in the metadata?

andrea-perego commented 6 years ago

That information is already in the metadata. Is the requirement that there is a separate property that contains the text of a citation? A problem might be that there are several citation styles. Would it not be more sensible to derive the citation from the information already there in the metadata?

+1 from me. However, the problem is that DCAT does not include all the metadata elements required for creating a citation (in particular, dataset authors).

An additional issue is that, since DCAT is not meant to specify mandatory, recommended and optional metadata elements, the relevant information may be missing. To address this, an option could be to include a guidance section saying that, if you want to support data citation, you need to include dataset authors, etc.

About which metadata elements should be recommended, as you say there are different citation styles, possibly requiring different information. Here my suggestion is to follow DataCite, and recommend (at least) the mandatory elements they define (which correspond to the minimal set of information to create a citation).

An overview of these elements is available in the background document of the work we did to map DataCite to DCAT-AP (go to section "DataCite and DCAT-AP at a glance"):

https://ec-jrc.github.io/datacite-to-dcat-ap/#comparison

The DataCite to DCAT-AP mappings are here:

https://ec-jrc.github.io/datacite-to-dcat-ap/#mapping-summary

nicholascar commented 6 years ago

A Use Case based on this for the automated generation of a citation suggestion using provenance elements is recorded at http://patterns.promsns.org/usecase/46.

riccardoAlbertoni commented 6 years ago

I suggest to untag "quality", as the requirement itself is not directly related to data quality.

nicholascar commented 6 years ago

I'd like to revitalize this issue in light of the work since the last post here on profiles.

It may be possible to make a profile (a Profile Dec Ont Profile) of DCAT2018 that adheres to DataCite. Then, if the Profile includes an Implementation Resource Descriptor that, in turn, includes machine-readable constraints, it will be possible to validate whether a particular instance of DCAT2018 adheres to the profile and thus if it's able to be used to generate DataCite-specified citations.

Specifically:
We could make a SHACL Implementation Resource Descriptor for a DCAT2018 DataCite Profile along the lines of the points Adrea expressed in the Use Case for this work: ID10 to test out DCAT2018/DataCite alignment, validation and aspects of the Profiling Ont.

kcoyle commented 6 years ago

@nicholascar While I see all of this as being possible, it may be beyond our stated requirements. In other words, we don't disallow the creation of such profiles, but nothing in our profile requirements would lead us to present this as anything more than one of many possibilities. I don't think we should be looking at anything more than the bare bones of the requirement, which is to include information in DCAT that could support citing the dataset.

nicholascar commented 6 years ago

@kcoyle you’ve interpreted the suggestion the wrong way around: this isn’t a proposal for out-of-scope extra profile work but a proposal to satisfy the requirement RDSC. I think we can satisfy it using profiling so if this group wishes to address the requirement, then they could do it via profiling. The requirement has already been ruled in scope so now it’s a matter of deciding how to implement it.

agbeltran commented 6 years ago

I agree with @nicholascar that looking at implementing this requirement would be providing a DCAT profile for data citation (thus indicating what are the required metadata for enabling data citation), in addition to making sure that we have indeed all the required properties within DCAT, as @andrea-perego pointed out. As this data citation profile would consider DataCite, this issue is related to https://github.com/w3c/dxwg/issues/152

kcoyle commented 6 years ago

@agbeltran Actually, offering a DCAT profile is out of scope, and is directly stated to be out of scope in the group charter. If DCAT has the needed properties, then nothing needs to be done. If it does not, then this is a DCAT requirement.

agbeltran commented 6 years ago

@kcoyle ok, thanks - yes, the requirement states "provide a way to specify information for data citation" so we should focus on making sure that all metadata can be represented (e.g. dataset authors); providing the profile would be the way to ensure that all information is available in a given dataset description. Good to know that is out of scope, but it would be a neat example of the use of a profile.

andrea-perego commented 6 years ago

@kcoyle , @agbeltran , @nicholascar , I think that this can be indeed done without defining a proper profile, if we limit ourselves to the mandatory elements of DataCite. What is missing in DCAT, as mentioned by @agbeltran , is dct:creator. However, also in this case, we should theoretically specify cardinality constraints (in the closed-world sense), which are not supposed to be included in DCAT - i.e., we need a profile for this.

An option is to adopt a more general approach, and relate it to a use case of "data citation". For instance, we can say that, if you want to publish records enabling data citation, you should include these information. We can also mention DataCite, but, after all, the information needed for data citation has not been invented by DataCite: it is the basic information that has been traditionally used for bibliographic references.

Maybe, this use scenario can be added just as one of the example at the beginning of the DCAT spec.

kcoyle commented 6 years ago

As an FYI, the Dublin Core approach to vocabularies (ontologies) vs. profiles is that a vocabulary should be defined with a minimum semantic commitment, and that vocabulary usage (cardinality, value ranges, etc.) is the role of the profile. The upshot of this (and I believe this is compatible with RDF) is that for most uses a vocabulary alone will not be sufficient. So a vocabulary becomes a kind of building block, but the profile turns it into a usable structure. You can then design your vocabulary for maximum reuse.

agbeltran commented 6 years ago

See some notes at: https://github.com/w3c/dxwg/wiki/Data-Citation

rob-metalinkage commented 6 years ago

One way to look at this is that the vocabulary addresses interoperability of elements within a resource, but profiles address usability and interoperability of the resource itself, with all its contained elements.

so DCAT supports this, but does not mandate it, so it seems an important example of how DCAT works with other vocabularies, and we can introduce a hypothetical profile as a recommended way to use DCAT when a such choices need to be made consistently within a community of practice.

agbeltran commented 6 years ago

Initial notes in PR: https://github.com/w3c/dxwg/pull/321

It can be viewed here: https://rawgit.com/w3c/dxwg/agb-data-citation/dcat/index.html

agbeltran commented 6 years ago

This PR has been merged and now the rawgit link won't work. Now the additions are visible in the current version of the document:

https://w3c.github.io/dxwg/dcat/#class-dataset

agbeltran commented 6 years ago

As agreed in today's call, this issue can be closed.