k-int / gokb-phase1

Original GOKb repo - Moving to https://github.com/openlibraryenvironment/gokb
http://www.gokb.org
Other
11 stars 5 forks source link

Improve identificiation of errors caused by matches made on bad IDs #467

Open kristenwilson opened 8 years ago

kristenwilson commented 8 years ago

Goal is to reject incoming data if it is incorrectly matching an ID in the same project or already in GOKb. A spec is probably needed. Some general ideas include:

  1. Reject a line if its the second instance of an ISSN being ingested from the same project
  2. Do a string check on the title and reject if match is not found with existing title in GOKb.
  3. Check that the coverage dates for the incoming titles are consistent with the publication dates already in GOKb. Reject incoming titles that have date conflicts.