Open pgwillia opened 6 years ago
I raised this with the Sirsi team and we came up with one possible scenario. We have a set up records in this data set dealing our HathiTrust commitment Basically print books that will never be discarded. Therefore if we can zero in on this set up records, the number should always be the same every month or a small increase as we are adding to this collection. It should never decrease.
The raw marc looks like this:
583: : committed to retain|c20170930|d20421231|fHathiTrust|uhttps://www.hathitrust.org/shared_print_program|5AEU|zHathiTrust Shared Print commitment 2017
Interesting thought, but looking at the ingest mapping file (symphony_ingest.properties) I don't think that field is in the index.
Some automation exists, but can it be improved?
There's part of an ansible playbook
There's automated tests that are run against search-test from cardiff which have this expectation
But Third, RE:
It would be trivial to compose an additional test that performs the same search, compares the top {10|100|1000|all} results against a set of expected titles... using solr directly, or through the Discovery web interface... or compares the old index against the new index. Easy to stack more "interesting" searches into this.
NEW IDEAS WELCOME: how else can we validate 6M records?
From @nmacgreg email [New Solr Collection: August EBSCO extract]