pkiraly / qa-catalogue

QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
GNU General Public License v3.0
76 stars 18 forks source link

Unimarc completeness #407

Closed gegic closed 5 months ago

gegic commented 5 months ago

This pull requests contains the implementation of the UNIMARC analysis for completeness.

There are not that many changes made in this pull request, and the only major change is reflected in the Unimarc Completeness Plugin whose main purpose is to determine the tag hierarchy, data from which is then later used in the Completeness.java class as identifiers within the analysis results.

The plugin is nowhere as complex as the Marc21 Completeness Plugin, as there isn't such a distinction in UNIMARC as with MARC21. Simply, UNIMARC groups its fields in different blocks identified only by the first digit of the respective field block, which makes the approach drastically easier in this case.

In addition, I've also modified some few portions of the existing code, as well as added a test for UNIMARC (and also made the existing PICA and AlephSeq tests a bit more strict). I thought it was a little cleaner and a bit more OS agnostic if the tests compared the generated and expected files, instead of comparing the lines (there are lines listed either way, but it feels a bit cleaner if they are in a separate file), however, I can change that if needed. The PICA and AlephSeq expected results aren't added in accordance to the results gained after the changes, but instead before any changes were made.