Data on Library Access to Scholarly Literature

This repository is cataloging University Library access to scholarly literature. Scholarly articles are identified using their DOIs. The impetus for this project was this discussion on the Sci-Hub Coverage Study.

The code in this repository facilitates fetching indicators of full-text availability for a list of DOIs from an OpenURL resolver. In this way, it enables large-scale analysis of bibliographic holdings / availability.

Using the Code

The code files in this repository assume that your working directory is set to the top-level directory of this repository.

Contents of this Repository, and the Order of Their Use

LICENSE-*.md: License text to accompany the License section of this Readme below.
environment.yml: Conda environment file (see Environment below).
.gitattributes: File with information for tracking files using Git Large File Storage (LFS).
library_management_system_downloader contains the following scripts, to be used in the following order:
1. downloader_configuration_file_TEMPLATE.py should be copied to downloader_configuration_file.py and edited for your own institution's OpenURL resolver (These scripts were specifically tested using the OpenURL resolver that comes with Ex Libris' Alma management software).
  - Within downloader_configuration_file.py, the variable api_base_url will be based on the OpenURL resolver / vendor that your institution uses, and thus will be different from institution to institution. To find out what that base URL should be, it may be necessary to ask your local library technology team for help and/or documentation.
  - It is additionally the case that different OpenURL resolvers may return slightly different formats of data. Thus, it may be necessary to modify the function fulltext_indication in the file evaluate_api_response_for_fulltext_indication.py to look for an XML field that the data from your institution's OpenURL resolver contains.
2. run_api_download_and_parse_results.py
3. copy_and_compress_database_and_extract_tsv.py
evaluate_library_access_from_output_tsv contains the following scripts, to be used in the following order:
1. create_stratefied_sample_of_dois.R
2. join_doi-200_dates_to_doi-500.R
3. [Run facilitate_going_through_dois_manually.R to help fill in the .tsv files created by the scripts above]
4. penntext-accuracy-200.ipynb
5. penntext-accuracy-500.ipynb
data: [This is where datasets will be saved by the above scripts.]

Environment

This repository uses conda to manage its environment as specified in environment.yml. Install the environment with:

conda env create --file=environment.yml

Then use source activate library-access and source deactivate to activate or deactivate the environment. On windows, use activate library-access and deactivate instead.

License

The files in this repository are released under the CC0 1.0 public domain dedication (LICENSE-CC0.md), excepting those that match the glob patterns listed below. Files matching the following glob patters are instead released under a BSD 3-Clause license (LICENSE-BSD-3-Clause.md):

*.py
*.md
.gitignore
*.r
*.sh

greenelab / library-access

readme

Data on Library Access to Scholarly Literature

Using the Code

Contents of this Repository, and the Order of Their Use

Environment

License