psu-libraries / scholarsphere

Penn State's next generation institutional repository
MIT License
12 stars 6 forks source link

"Pathways" proposal #1153

Open srerickson opened 2 years ago

srerickson commented 2 years ago

Background

ScholarSphere is a general purpose repository that allows users to deposit all types of research materials -- data, instructional materials, articles, etc. This flexibility has been important to the repository's growth but it also comes with some downsides. All content types in ScholarSphere are treated pretty much the same way: they share a common deposit process and the same validation rules apply across the board. A one-size-fits-all approach makes it difficult to tailor the application for specific, important categories such as open access journal articles and FAIR datasets. This proposal would address the problem by introducing a new abstraction (the pathway) that encapsulates details relating to submission, validation, and (potentially) access. This proposal would not completely replace the existing general-purpose deposit process; it would permit additional customized processes tailored to particular use cases.

Defining Pathways

The functionality that pathways should provide becomes more clear by considering two use cases: OA versions of published journal articles (postprints) and curated research data & code (FAIR data).

Example 1: OA Journal Articles

With the implementation of Penn State's Open Access Policy, open access versions of published journal articles (postprints) are an increasingly important category of new submissions to ScholarSphere. Most postprints are now deposited to ScholarSphere through the new Researcher Metadata Database integration, which significantly simplifies the deposit process by taking advantage of article metadata aggregated in RMDB and presenting users with a customized UI. In essence, the RMDB integration constitutes a unique pathway for depositing OA versions of journal articles. ScholarSphere itself, however, doesn't treat these submissions any differently than it would a self-published report or a dateset. A more complete pathway for depositing and accessing OA versions of journal articles to ScholarSphere might include the following:

The functionality above would significantly improve the accessibility and user experience of OA articles on ScholarSphere. However, this logic is not appropriate for all works on ScholarSphere --- for example, datasets and student work should not be indexed by Unpaywall. A dedicated pathway for OA journal articles would support this kind of application logic.

Example 2: FAIR Data

Research data is another important category or work on ScholarSphere. Funders and journals increasingly require researchers to deposit datasets (i.e., supplementary materials for published articles) on repositories that enforce FAIR principles. FAIR datasets must have DOIs, good metadata, and complete documentation; typically, they must go through a data curation process involving some revision by the depositor. These are optional rather than required features for datasets deposited on ScholarSphere, which has lead to some confusion (from users and journal reviewers) as to whether ScholarSphere actually complies with the FAIR principles. ScholarSphere's commitment to the FAIR principles would be more explicit through a pathway with the following features:

Requirements

From the preceding examples, the requirements for implementing pathways are more clear. Pathways should allow:

Pathways are closely related to 'work types', but they involve more complexity because they determine how a work is deposited and the rules that new submissions are validated against. In addition, pathways are independent of work types in the sense that a pathway may permit any number of work types.

User Experience of Pathways

ScholarSphere users would primarily interact with pathways during the submission process. At deposit, they would be given a choice: use one of the available pathways ("ScholarShere-OA", "ScholarSphere-FAIR Data") or use the current, general-purpose deposit form. Using one of the pathways should generally provide a more user friendly experience and result in higher-quality submissions. The general-purpose deposit form is available for other kinds of content and/or "advanced" use cases.

Pathways may also affect how people find and access materials. For example the basic search interface might include options to scope the search to "All of ScholarSphere", "Journal Articles", "Research Data."

anaelizabethenriquez commented 2 years ago

I would love to see this happen. Tailoring ScholarSphere's approach to deposits of postprints under the OA policy will improve the user experience and increase access.

Minor point: Unpaywall excludes non-CrossRef DOIs, so it's probably not necessary (at least not for that reason) to limit what's available via OAI-PMH. (see #183)

bdezray commented 2 years ago

This looks great and I also would love to see this happen! For the FAIR data deposits, it seems like it would definitely help with supporting compliance with funder data sharing mandates and improving the quality of the data deposited to ScholarSphere. I am curious if it would make sense to facilitate project based/manuscript based deposits that include multiple work types or work type at a file or folder level. Have you had any feedback on this from researchers?

bdezray commented 5 months ago