uclibs / ucrate

Scholar@UC: University of Cincinnati's self-submission institutional repository
https://scholar.uc.edu
Other
5 stars 3 forks source link

Update Scholar to meat FAIR requirements (I1-01b and R1-03-03) #1021

Open scherztc opened 1 year ago

scherztc commented 1 year ago

Descriptive summary

I1-01b

As a repository user, I would like Scholar to provide standard vocabularies to use when creating metadata so that the content can be referenced per these standards and meet FAIR principles

R1-03-03

As a a repository user, I would like Scholar to support the NIH Common Data Elements so that I can ensure my data conforms to this standard.   (See the interoperability section of the FAIR principles as well)

Is UC Scholar FAIR?

Leah Everitt 2022-03-18 

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier ✅

All metadata pages on Scholar are assigned a HTTPS PURL which is a globally unique and persistent identifier

Some metadata pages also have a DOI associated with them which is another globally unique and persistent identifier and is widely used throughout the research community

F2. data are described with rich metadata (defined by R1 below) 🆗

Details on R1

F3. metadata clearly and explicitly include the identifier of the data it describes ✅

DOI use makes this statement true for Scholar

F4. (meta)data are registered or indexed in a searchable resource 🆗

Scholar is searchable, however improvements to the documentation for the search would make this statement more true

There should be a thesaurus or index which details how the archive is searched - also describe how this is different from the browsing option

Which metadata fields are searched?

How is punctuation treated?

How are the results sorted?

How to filter the results using specific metadata fields

Scholar resources can also be accessed through Google’s search engine

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol ✅

Metadata pages in Scholar are HTTPS pages which are a standardized communications protocol

All pages can be accessed via a permanent HTTPS link which serves as an identifier

DOI is available as another permanent identifier and link

A1.1 the protocol is open, free, and universally implementable ✅

Yes, HTTPS is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary ✅

Creation of login allows authentication and authorization for the non-UC community to get access to non- open access Scholar works

A2. metadata are accessible, even when the data are no longer available ✅

Metadata is not deleted from Scholar even after removal of the resource, metadata can be accessed via their permanent HTTPS link

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. ❌

Scholar does not use any standard vocabularies or data models to make all of the data on Scholar interoperable (more details at I2)

While the metadata on Scholar should be machine readable, it is not in a standard format such as Dublin Core

Metadata fields from Dublin Core not available in Scholar include:

Relation (exists as external link, does it need to be labeled as a Relation to be machine-readable with Dublin Core, if so should we use Dublin Core in a way that makes it so that search engines can easily identify all of our metadata elements given we already include most of Dublin Core?)

Source (if for instance the dataset is derived from another source)

Subject (the keyword metadata field could be a stand-in for this field)

Type (type could be automatically included on the metadata page as ‘dataset’)

Is there a better standard available than Dublin Core for general use?

I2. (meta)data use vocabularies that follow FAIR principles ❌

Scholar does not use any standard vocabularies and does not have an associated thesaurus

Suggestion is that we add-in the option to use real subject headings (rather than just keywords)- LCSH or possibly MeSH

This is a key missing piece of FAIR within Scholar currently, however this is common in institutional repositories (ex. Deep Blue also does not have this)

I3. (meta)data include qualified references to other (meta)data 🆗

Scholar does this by using parent/child relationships and collections

The External Link element also allows for this, however it would help to include a field to specify how the link is related to the dataset

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes 🆗

Datasets might be richly described, but they also may not be- this is up to the submitting researcher

Given that researchers do not specialize in metadata and making things accessible/reusable they are not likely to richly describe their data with metadata

Curation of submitted works would help to make all submitted datasets richly described, but UC libraries likely does not have the resources to do this, but I wanted to mention that when comparing UC Scholar to other institutional repositories this should be taken into account

R1.1. (meta)data are released with a clear and accessible data usage license ✅

UC Scholar has a Terms of Use page available for site users reflecting the different usage licenses which may be used in the repository

UC Scholar has a license wizard available to help depositors choose the correct usage license for their data

R1.2. (meta)data are associated with detailed provenance 🆗

Adding in a Source metadata element could help establish detailed provenance – especially in the case of a researcher reusing a dataset and producing a new dataset that they are depositing into Scholar

Allowing Relation links could aid in establishing provenance –for instance in the case of a dataset being related to a publication

Creation Date is very important for establishing provenance, this should be a required element

A citation tool would also aid others in citing the works on Scholar and ensure that provenance is maintained

There also needs to be a place which records any analysis done before the dataset was uploaded (ex a dataset of averages)

R1.3. (meta)data meet domain-relevant community standards 🆗

Scholar cannot maintain domain-relevant standards for all domains

Scholar should have a link out to different domain-relevant repositories with a message explaining the benefits of using a domain-specific repository

NIH data sharing changes may necessitate support for the Common Data Elements made by the National Institute of Neurological Disorders and Stroke so that NIH grant holders are compliant more information about this schema available at the link -> https://www.commondataelements.ninds.nih.gov/general%20(For%20all%20diseases)#pane-164

Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

Expected behavior

Actual behavior

Steps to reproduce the behavior

  1. Do this
  2. Then do this...

Related work

Link to related issues or prior related work here.

scherztc commented 10 months ago

Citations,
Leasing, Interopolity and
Reproducability (Software Required) odd data formats