mjordan / iipqa

Command-line tool for applying Quality Assurance checks against Islandora import packages in preparation for importing them into Islandora.
GNU General Public License v3.0
3 stars 0 forks source link

Islandora Import Package QA Tool Build Status

A tool for applying Quality Assurance checks against Islandora import packages in preparation for importing them.

System requirements and installation

To install the Islandora Import Package QA Tool:

iipqa uses wget to retrieve some schema files from the Library of Congress on installation. If you are on a system that does not have wget installed and in your PATH (e.g., most Windows systems), you will see an error starting with 'wget' is not recognized as an internal or external command, operable program or batch file. If you see this error, all you need to do is manually download the following two files into iipqa's src/utils directory if you want to use iipqa to validate MODS XML files:

What does iipqa check for?

The MODS validation test is optional, and is enabled by providing the -v option in the iipqa command. The other tests are always runi (although you can disable the check in compound object input for the structure.xml file by including the --skip_structure/-k option).

Usage

iipqa should be run against your Islandora import packages prior to loading the packages with Islandora Batch, Islandora Book Batch, Islandora Newspaper Batch, or Islandora Compound Batch. Run iipqa as follows:

php iipqa [options] directory

'directory' (required) is the path to the directory containing Islandora import packages you want to test. The trailing slash is optional. If you wish, you may specify the path to a Zip file instead of a directory. The Zip file must be structured as required by Islandora Batch, Book Batch, or Newspaper Batch.

Options:

-m/--content_model <argument>
     Required. An aliases for groups of Islandora content models. Allowed values are single, single_rest_ingester, newspapers, books, compound.

-l/--log <argument>
     Path to the log. Default is ./iipqa.log

-s/--strict
     If present, iipqa will exit with a code of 1 instead of 0 if it encounters any errors. Useful while running iipqa within shell scripts.

-v/--validate_mods
     If present, iipqa will validate all MODS XML files in all input packages.

-k/--skip_structure
     If present, iipqa will skip validating the presence of structure.xml files in compound packages.

-p/--post_iipqa <argument>
     Path to script to run after iipqa performs its tests.

--help
     Show the help page for this command.

When you run iipqa, like this:

./iipqa -m single -l ./test.txt /tmp/test

you will see output like this if no QA tests fail:

Starting QA tests...
Running test 'Unique file extensions'   ########## Done.
Running test 'XML/OBJ pairs'        ########## Done.
Running test 'Directories present'  ########## Done.
All tests successful.

or like this, if any do:

Starting QA tests...
Running test 'Unique file extensions'   ########## Done.
Running test 'XML/OBJ pairs'        ########## Done.
Running test 'Directories present'  ########## Done.
Some tests failed. Details are available in test.txt

If any of iipqa's checks failed, details of the failure will be available in your log file.

Post-iipqa scripts

If you include the -p option with the path to one or more executable scripts, iipqa will run the script(s) after it has completed all of its core tests. This script can be written in any language. You can use it to add your own tests, such as checking the resolution of TIFF files or verifying the encoding of OCR files, or do things like email yourself the iipqa log file.

Here are some examples of how to run post-iipqa scripts:

To run a single script:

-p somescript.sh

To run a single script with arguments:

-p "somescript.sh foo bar"

To run multiple scripts, some with arguments:

-p [somescript.php, "someotherscript.php foo bar", cleanup.py]

Scripts with arguments must be wrapped in double quotes ("), and multiple script paths (and their arguments) must be separated by commas (,) wrapped in square brackets ([]) as illustrated in these examples.

The scripts directory contains three examples. One of them, check_title_length.php, performs a real-world test: it checks for titles in MODS XML files that exceed Fedora Repository's limit of 255 characters for object labels, and also checks for empty mods:title elements. Running the check_title_length.php script would look like this:

./iipqa -m single -l test.log -p "scripts/check_title_length.php /tmp/input" /tmp/input

The other two scripts are developer examples.

License

GPLv3

To do

Development/contributing

There are two ways to extend this tool so that it performs additional QA tests on Islandora ingest packages:

  1. write custom post-iipqa scripts, or
  2. modify the core content-model classes.

If you want to contribute to the development of iipqa, please consider the following: