There are some variant uses of OAI-PMH, etc. that pose problems for refactoring which include. These problems will need to be addressed when refactoring:
[x] Harvest (or inversely, blacklist) specific sets by a command line flag
[ ] Harvest part of a set. (e.g. two sets from AUC are partially approved)
[x] Deal with cases in which sets are not used (e.g. QNL does not use sets. They have ~32k records and we only want the ~15k from the BL. These are indicated by the values of the physicalLocation element only, as far as I can tell.
[x] Pass the metadataPrefix in a command line flag
[x] Break xml doc into one record per file (this includes recognizes the record delimiter in various file structures)
[x] Write records to collection/data/*.xml
[x] Skip deleted records
[x] move common methods to a separate module/file (e.g. to_str) and include where needed
Some of these should be pretty straight forward, others seem like they will require customization.
The partial set problem needs to be handled by harvesting and writing a separate script to delete records that we cannot keep. There are two many variables/unknowns to handle this at harvest time.
There are some variant uses of OAI-PMH, etc. that pose problems for refactoring which include. These problems will need to be addressed when refactoring:
to_str
) and include where neededSome of these should be pretty straight forward, others seem like they will require customization.