A Dataset property for denoting how a authors of the dataset should be acknowledge

chrisgorgo commented 5 years ago

I am looking for a Dataset property that would we a valid place to describe how the authors of a dataset wish to be acknowledged when the data is reused. This is a common practice - for examples:

From: https://www.humanconnectome.org/study/hcp-young-adult/document/hcp-citations

Papers, book chapters, books, posters, oral presentations, and all other printed and digital presentations of results derived from HCP data should contain the following wording in the acknowledgments section:

"Data were provided [in part] by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University."

From: https://ida.loni.usc.edu/collaboration/access/appLicense.jsp;jsessionid=276077ED59DB7D645715D6EB1024280D

I will acknowledge funding by the ADNI in the support acknowledgement section of the manuscript using language similar to the following:

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

The Brain Imaging Data Structure standard has field for that called "HowToAcknowledge".

Such field would be useful for two reasons:

It allows to spread the instructions provided by the data authors potentially increasing adherence.
The acknowledgement text can be a useful signal when looking for scholarly articles reusing particular dataset (in absence of proper dataset identifier).

@mfenner and @natashafn might be interested in this conversation

danbri commented 4 years ago

@chrisgorgo I thought we had some agreement to accomplish this with existing properties, but I can't find it now.

If we are going to add a new property it needs to begin with a lowercase character, and be designed for use in our large flat namespace e.g. we should consider how it looks in context of other types (scholarly articles, news articles...).

danbri commented 4 years ago

ping @mfenner - do you have anything in this area we can work from?

(edit) see also Twitter, https://twitter.com/danbri/status/1214204533284442112

danbri commented 4 years ago

Also found this usage:

"citation": "Casciotti, K. L., Cutter, G., Lam, P. J. (2019) Bottle files from the US GEOTRACES Pacific Meridional Transect (PMT) cruise (GP15) from September to November 2018. Biological and Chemical Oceanography Data Management Office (BCO-DMO). Dataset version 2019-10-22 [if applicable, indicate subset used]. http://lod.bco-dmo.org/id/dataset/777951 [access date]

From the extensive inline json-ld in https://www.bco-dmo.org/dataset/777951

Not sure that's quite what /citation expects, ...

mfenner commented 4 years ago

This is something I see frequently, but I find it not easy to implement for two reasons:

free text is hard to read for machines, e.g. extracting information about a publication and/or funding. Existing fields in schema.org probably cover this better.
sometimes there is a conflation between legal requirements and scholarly best practices (how to acknowledge usually falls into the latter category, but enough people think they can make this a requirement)

A related concept is data availability statements, which are increasingly required by scholarly publishers, e.g. https://www.springernature.com/de/authors/research-data-policy/data-availability-statements/12330880.

JATS, the standard XML schema to encode scholarly articles, uses an <ack> tag: https://jats.nlm.nih.gov/archiving/tag-library/1.1d1/n-ve20.html

Maybe a pragmatic approach is a property (e.g. howToAcknowledge or the broader acknowledgments) that is intended to be human-readable and works for all CreativeWork, and then use additional properties such as Grant or isBasedOn for machine readability.

andrea-perego commented 4 years ago

I think it may be worth looking at how this has been addressed in ODRS (http://schema.theodi.org/odrs/), also checking the related documentation - see in particular the relevant sections in the ODRS use cases, publishers' guide, and re-users' guide - plus @ldodds 's blog post on How Do We Attribute Data?

FYI, this issue was also part of a more general discussion on licences / rights statement in the context of the revision of DCAT (see https://github.com/w3c/dxwg/issues/114) - actually, in DCAT2, ODRS is mentioned as one of the vocabularies that can be used to support a more articulated specification of licences and rights statements (see https://www.w3.org/TR/vocab-dcat-2/#license-rights).

RichardWallis commented 4 years ago

See issue #7 for the context of the move from the main Schema.org issue tracker to this repository.

trashbirdecology commented 4 years ago

Related to this is the role of citing a data steward, of sorts. That is, someone who has not contributed to the creation of the content, but rather, has been a major player/driving force in improving the FAIRness and played a large role in building documentation or pushing through to archival.

schemaorg / suggestions-questions-brainstorming

A Dataset property for denoting how a authors of the dataset should be acknowledge #77