uga-libraries / web-archive-it-api

Scripts for using the Archive-It APIs to generate reports.
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Archive-It APIs Scripts

Overview

These scripts use the Archive-It web archiving service APIs (Partner API and WASAPI) to generate reports. They are used to prepare for quarterly downloads from Archive-It for preservation and to review and update metadata.

All reports are CSVs. Report scripts in this repository:

Getting Started

Dependencies

Installation

Prior to using any of these scripts, create a file named configuration.py, modeled after configuration_template.py, and save it to your local copy of this repository. This defines a place for script output to be saved and includes your Archive-It login credentials.

Script Arguments

collection_metadata_report.py

preservation_download_tracker.py

seed_metadata_report.py

warc_csv.py

Testing

There are unit tests for each function and the entire script for each of the scripts, except for check_config() (Issue 21) and the API error for get_metadata() (Issue 22). The tests for functions that call the API and for the script rely on UGA Archive-It data. For UGA, the expected results of these tests may need to be updated occasionally to keep in sync with our edits. To use these tests with another account, all expected results must be edited to use data in that account.

Workflow

These scripts are used for two different workflows at UGA:

The reports may also be created and used individually.

Author

Adriane Hanson, Head of Digital Stewardship, University of Georgia