As a repository user, I would like Scholar to provide standard vocabularies to use when creating metadata so that the content can be referenced per these standards and meet FAIR principles
R1-03-03
As a a repository user, I would like Scholar to support the NIH Common Data Elements so that I can ensure my data conforms to this standard. (See the interoperability section of the FAIR principles as well)
Is UC Scholar FAIR?
Leah Everitt 2022-03-18
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier ✅
All metadata pages on Scholar are assigned a HTTPS PURL which is a globally unique and persistent identifier
Some metadata pages also have a DOI associated with them which is another globally unique and persistent identifier and is widely used throughout the research community
F2. data are described with rich metadata (defined by R1 below) 🆗
Details on R1
F3. metadata clearly and explicitly include the identifier of the data it describes ✅
DOI use makes this statement true for Scholar
F4. (meta)data are registered or indexed in a searchable resource 🆗
Scholar is searchable, however improvements to the documentation for the search would make this statement more true
There should be a thesaurus or index which details how the archive is searched - also describe how this is different from the browsing option
Which metadata fields are searched?
How is punctuation treated?
How are the results sorted?
How to filter the results using specific metadata fields
Scholar resources can also be accessed through Google’s search engine
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol ✅
Metadata pages in Scholar are HTTPS pages which are a standardized communications protocol
All pages can be accessed via a permanent HTTPS link which serves as an identifier
DOI is available as another permanent identifier and link
A1.1 the protocol is open, free, and universally implementable ✅
Yes, HTTPS is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary ✅
Creation of login allows authentication and authorization for the non-UC community to get access to non- open access Scholar works
A2. metadata are accessible, even when the data are no longer available ✅
Metadata is not deleted from Scholar even after removal of the resource, metadata can be accessed via their permanent HTTPS link
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. ❌
Scholar does not use any standard vocabularies or data models to make all of the data on Scholar interoperable (more details at I2)
While the metadata on Scholar should be machine readable, it is not in a standard format such as Dublin Core
Metadata fields from Dublin Core not available in Scholar include:
Relation (exists as external link, does it need to be labeled as a Relation to be machine-readable with Dublin Core, if so should we use Dublin Core in a way that makes it so that search engines can easily identify all of our metadata elements given we already include most of Dublin Core?)
Source (if for instance the dataset is derived from another source)
Subject (the keyword metadata field could be a stand-in for this field)
Type (type could be automatically included on the metadata page as ‘dataset’)
Is there a better standard available than Dublin Core for general use?
I2. (meta)data use vocabularies that follow FAIR principles ❌
Scholar does not use any standard vocabularies and does not have an associated thesaurus
Suggestion is that we add-in the option to use real subject headings (rather than just keywords)- LCSH or possibly MeSH
This is a key missing piece of FAIR within Scholar currently, however this is common in institutional repositories (ex. Deep Blue also does not have this)
I3. (meta)data include qualified references to other (meta)data 🆗
Scholar does this by using parent/child relationships and collections
The External Link element also allows for this, however it would help to include a field to specify how the link is related to the dataset
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes 🆗
Datasets might be richly described, but they also may not be- this is up to the submitting researcher
Given that researchers do not specialize in metadata and making things accessible/reusable they are not likely to richly describe their data with metadata
Curation of submitted works would help to make all submitted datasets richly described, but UC libraries likely does not have the resources to do this, but I wanted to mention that when comparing UC Scholar to other institutional repositories this should be taken into account
R1.1. (meta)data are released with a clear and accessible data usage license ✅
UC Scholar has a Terms of Use page available for site users reflecting the different usage licenses which may be used in the repository
UC Scholar has a license wizard available to help depositors choose the correct usage license for their data
R1.2. (meta)data are associated with detailed provenance 🆗
Adding in a Source metadata element could help establish detailed provenance – especially in the case of a researcher reusing a dataset and producing a new dataset that they are depositing into Scholar
Allowing Relation links could aid in establishing provenance –for instance in the case of a dataset being related to a publication
Creation Date is very important for establishing provenance, this should be a required element
A citation tool would also aid others in citing the works on Scholar and ensure that provenance is maintained
There also needs to be a place which records any analysis done before the dataset was uploaded (ex a dataset of averages)
R1.3. (meta)data meet domain-relevant community standards 🆗
Scholar cannot maintain domain-relevant standards for all domains
Scholar should have a link out to different domain-relevant repositories with a message explaining the benefits of using a domain-specific repository
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Expected behavior
Actual behavior
Steps to reproduce the behavior
Do this
Then do this...
Related work
Link to related issues or prior related work here.
Descriptive summary
I1-01b
As a repository user, I would like Scholar to provide standard vocabularies to use when creating metadata so that the content can be referenced per these standards and meet FAIR principles
R1-03-03
As a a repository user, I would like Scholar to support the NIH Common Data Elements so that I can ensure my data conforms to this standard. (See the interoperability section of the FAIR principles as well)
Is UC Scholar FAIR?
Leah Everitt 2022-03-18
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier ✅
All metadata pages on Scholar are assigned a HTTPS PURL which is a globally unique and persistent identifier
Some metadata pages also have a DOI associated with them which is another globally unique and persistent identifier and is widely used throughout the research community
F2. data are described with rich metadata (defined by R1 below) 🆗
Details on R1
F3. metadata clearly and explicitly include the identifier of the data it describes ✅
DOI use makes this statement true for Scholar
F4. (meta)data are registered or indexed in a searchable resource 🆗
Scholar is searchable, however improvements to the documentation for the search would make this statement more true
There should be a thesaurus or index which details how the archive is searched - also describe how this is different from the browsing option
Which metadata fields are searched?
How is punctuation treated?
How are the results sorted?
How to filter the results using specific metadata fields
Scholar resources can also be accessed through Google’s search engine
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol ✅
Metadata pages in Scholar are HTTPS pages which are a standardized communications protocol
All pages can be accessed via a permanent HTTPS link which serves as an identifier
DOI is available as another permanent identifier and link
A1.1 the protocol is open, free, and universally implementable ✅
Yes, HTTPS is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary ✅
Creation of login allows authentication and authorization for the non-UC community to get access to non- open access Scholar works
A2. metadata are accessible, even when the data are no longer available ✅
Metadata is not deleted from Scholar even after removal of the resource, metadata can be accessed via their permanent HTTPS link
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. ❌
Scholar does not use any standard vocabularies or data models to make all of the data on Scholar interoperable (more details at I2)
While the metadata on Scholar should be machine readable, it is not in a standard format such as Dublin Core
Metadata fields from Dublin Core not available in Scholar include:
Relation (exists as external link, does it need to be labeled as a Relation to be machine-readable with Dublin Core, if so should we use Dublin Core in a way that makes it so that search engines can easily identify all of our metadata elements given we already include most of Dublin Core?)
Source (if for instance the dataset is derived from another source)
Subject (the keyword metadata field could be a stand-in for this field)
Type (type could be automatically included on the metadata page as ‘dataset’)
Is there a better standard available than Dublin Core for general use?
I2. (meta)data use vocabularies that follow FAIR principles ❌
Scholar does not use any standard vocabularies and does not have an associated thesaurus
Suggestion is that we add-in the option to use real subject headings (rather than just keywords)- LCSH or possibly MeSH
This is a key missing piece of FAIR within Scholar currently, however this is common in institutional repositories (ex. Deep Blue also does not have this)
I3. (meta)data include qualified references to other (meta)data 🆗
Scholar does this by using parent/child relationships and collections
The External Link element also allows for this, however it would help to include a field to specify how the link is related to the dataset
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes 🆗
Datasets might be richly described, but they also may not be- this is up to the submitting researcher
Given that researchers do not specialize in metadata and making things accessible/reusable they are not likely to richly describe their data with metadata
Curation of submitted works would help to make all submitted datasets richly described, but UC libraries likely does not have the resources to do this, but I wanted to mention that when comparing UC Scholar to other institutional repositories this should be taken into account
R1.1. (meta)data are released with a clear and accessible data usage license ✅
UC Scholar has a Terms of Use page available for site users reflecting the different usage licenses which may be used in the repository
UC Scholar has a license wizard available to help depositors choose the correct usage license for their data
R1.2. (meta)data are associated with detailed provenance 🆗
Adding in a Source metadata element could help establish detailed provenance – especially in the case of a researcher reusing a dataset and producing a new dataset that they are depositing into Scholar
Allowing Relation links could aid in establishing provenance –for instance in the case of a dataset being related to a publication
Creation Date is very important for establishing provenance, this should be a required element
A citation tool would also aid others in citing the works on Scholar and ensure that provenance is maintained
There also needs to be a place which records any analysis done before the dataset was uploaded (ex a dataset of averages)
R1.3. (meta)data meet domain-relevant community standards 🆗
Scholar cannot maintain domain-relevant standards for all domains
Scholar should have a link out to different domain-relevant repositories with a message explaining the benefits of using a domain-specific repository
NIH data sharing changes may necessitate support for the Common Data Elements made by the National Institute of Neurological Disorders and Stroke so that NIH grant holders are compliant more information about this schema available at the link -> https://www.commondataelements.ninds.nih.gov/general%20(For%20all%20diseases)#pane-164
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Expected behavior
Actual behavior
Steps to reproduce the behavior
Related work
Link to related issues or prior related work here.