tetherless-world / dco-ontology

Deep Carbon Observatory Ontology
Creative Commons Zero v1.0 Universal
1 stars 0 forks source link

Publication dataset property? #44

Open zednis opened 9 years ago

zednis commented 9 years ago

We may want to add a property specific to publication datasets.

Here is an idea of what it could look like:

dco:hasPublicationDataset a owl:ObjectProperty ;
    rdfs:label "publication dataset" ;
    rdfs:domain bibo:Publication ;
    rdfs:range dcat:Dataset ;
    owl:inverseOf dco:publicationDatasetOf ;
    rdfs:subPropertyOf prov:hadDerivation .

dco:publicationDatasetOf a owl:ObjectProperty ;
    rdfs:label "publication dataset of" ;
    rdfs:domain dcat:Dataset ;
    rdfs:range bibo:Publication ;
    owl:inverseOf dco:hasPublicationDataset ;
    rdfs:subPropertyOf prov:wasDerivedFrom .

This property could be a sub-property of the more specific

Additionally we could define a subclass of Derivation (or one of its subclasses) so we could have a qualified publication dataset relationship.

mrpatrickwest commented 9 years ago

Should we create an ontology change request document for this? Probably be a good idea to get the full picture. Your suggestions seems to be part of a larger discussion.

zednis commented 9 years ago

Opened this ticket to start the general discussion. I would suggest we open a a change request when/if the discussion evolves into a consensus around making a change.

mrpatrickwest commented 9 years ago

wasQuotedFrom seems to be an accurate property to use in this case.

I'm hoping to capture the activity that is generating this new derivation (Dataset). Specifically, Hao ran the pdf through this application to generate the dataset, then visually verified the resulting dataset. whatever the process is.

zednis commented 9 years ago

One open question I have is how should the contributions/roles of the data rescuers be recorded?

Right now the authors of the rescued/publication dataset are the members of the DCO-DS team that did the data rescue, but is this what we want to say? Do the authors of the original publication have any role/contribution in the dataset? The paper authors were the individuals that put the source material together, we just extracted it and put it in a machine readable format.

Perhaps if we make the provenance of the dataset derivation relationship clearer I would feel better about DCO-DS members being authors of these datasets.

zednis commented 9 years ago

:+1: to prov:wasQuotedFrom being applicable.

mrpatrickwest commented 9 years ago

The activity we're representing here is the extraction of the dataset from the publication. We don't know if the authors of the publication are just referencing data that someone else generated or what. What we do know is that Hao ran an application that extracted data from a publication.

zednis commented 9 years ago

I think I will be ok with the DCO-DS members being authors (or even better creators) of the publication dataset once the provenance relationship is clearly made to the source publication.

olyerickson commented 9 years ago

Regarding Patrick's assertion that we don't know whether the original authors are simply referencing other data or providing original data...we better, because they would have had to reveal that in their papers!

Note that if we actually DOCUMENT and PUBLISH this data rescue process, we can refer to it in the provenance metadata. perhaps that is incentive to somehow document it, e.g. as a poster, TWC TR, something...

Regarding using the quotation property, note that it may have an optional companion property that specified the extent of the quotation. The intent of extent is to describe in some (apparently loose...) way what the snippet is, e.g. "Table 1" or whatever.

John

On Mon, Aug 24, 2015 at 12:30 PM, Stephan Zednik notifications@github.com wrote:

I think I will be ok with the DCO-DS members being authors (or even better creators) of the publication dataset once the provenance relationship is clearly made to the source publication.

— Reply to this email directly or view it on GitHub https://github.com/tetherless-world/dco-ontology/issues/44#issuecomment-134290913 .

John S. Erickson, Ph.D. Director of Operations, The Rensselaer IDEA Deputy Director, Web Science Research Center (RPI) http://tw.rpi.edu olyerickson@gmail.com Twitter & Skype: olyerickson

zednis commented 9 years ago

I think using an additional property (qualifying the quotation) to describe where in the document the dataset was extracted from makes great sense.

I am not sure if there is a property already defined in PROV that would be used for that purpose, but we could certainly define one for ourselves.

mrpatrickwest commented 9 years ago

I am simply stating that it is outside the scope of our work to try to ascertain whether one or more of the authors is the author of the original dataset from which the data in the tables is derived and what datasets the data in the tables comes from. It is outside the scope of our work to provide provenance of the publication and parts of the publication. Our scope includes the generation of a new dataset from the data in the pdf. Sure we could do some research and determine who are the authors of the original dataset(s). But again, it is outside the scope of our provenance generation.

olyerickson commented 9 years ago

The act of specifying the extent is qualifying the quotation...

The examples are really weak on any specific properties; they tend to use adHocomies, e.g. my:fromSection

prov:qualifiedQuotation [ a prov:Quotation; prov:entity http://purl.org/twc/page/thoughts-from-the-dagstuhl-workshop; my:fromSection 1;

On Mon, Aug 24, 2015 at 1:51 PM, Stephan Zednik notifications@github.com wrote:

I think using an additional property (qualifying the quotation) to describe where in the document the dataset was extracted from makes great sense.

I am not sure if there is a property already defined in PROV that would be used for that purpose, but we could certainly define one for ourselves.

— Reply to this email directly or view it on GitHub https://github.com/tetherless-world/dco-ontology/issues/44#issuecomment-134315666 .

John S. Erickson, Ph.D. Director of Operations, The Rensselaer IDEA Deputy Director, Web Science Research Center (RPI) http://tw.rpi.edu olyerickson@gmail.com Twitter & Skype: olyerickson