k-int / gokb-phase1

Original GOKb repo - Moving to https://github.com/openlibraryenvironment/gokb
http://www.gokb.org
Other
11 stars 5 forks source link

Review tasks for identifier mismatch #541

Closed jhsolomon closed 8 years ago

jhsolomon commented 8 years ago

For Acta Scientiae Veterinariae, a review task was generated that the Ingest file eissn matched an existing issn, which is not true. Also, in Live, there is no such review task.

https://gokb-test.openlibraryfoundation.org/gokb/resource/show/org.gokb.cred.JournalInstance%3A19636511#identifiers

ianibo commented 8 years ago

Here is the line from issn-l

1678-0345 1678-0345 1679-9216

Because eISSN values are really ISSN values (They don't overlap namespaces) the issn-l just refers to everything as an "ISSN". So when we load this line, we create 1 issn-l and 2 issn identifiers (Of which one refers to an electronic item).

When we load this record, the system does correctly identify that an eISSN (The one from the package) has cross-checked with an issn (The one from the issn-l table). Loading the package causes the eISSN to also be created (Along with the review request).

Later, we run the housekeeping which should remove errant ISSN/eISSN pairings - leaving us with the situation we see on test.

The problems are (a) That this review request can still spot problem scenarios, but it will also catch every issue where we're importing an eISSN (b) The issn-l file does not distinguish ISSN from eISSN

The options are to (1) not have this check any more (2) Have a switch disabling it for the dataload from live (3) remove all the matching review requests for this kind of issue following the load from live

This revreq is never seen on live because we never loaded issn-l on live

Let me know what you want to do

jhsolomon commented 8 years ago

Let's discuss this one in the call.

On Fri, Sep 16, 2016 at 3:04 AM, Ian Ibbotson notifications@github.com wrote:

Here is the line from issn-l

1678-0345 1678-0345 1679-9216

Because eISSN values are really ISSN values (They don't overlap namespaces) the issn-l just refers to everything as an "ISSN". So when we load this line, we create 1 issn-l and 2 issn identifiers (Of which one refers to an electronic item).

When we load this record, the system does correctly identify that an eISSN (The one from the package) has cross-checked with an issn (The one from the issn-l table). Loading the package causes the eISSN to also be created (Along with the review request).

Later, we run the housekeeping which should remove errant ISSN/eISSN pairings - leaving us with the situation we see on test.

The problems are (a) That this review request can still spot problem scenarios, but it will also catch every issue where we're importing an eISSN (b) The issn-l file does not distinguish ISSN from eISSN

The options are to (1) not have this check any more (2) Have a switch disabling it for the dataload from live (3) remove all the matching review requests for this kind of issue following the load from live

This revreq is never seen on live because we never loaded issn-l on live

Let me know what you want to do

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/k-int/gokb-phase1/issues/541#issuecomment-247532388, or mute the thread https://github.com/notifications/unsubscribe-auth/AMwAJazVhMQX_MAiat9ecmrdulXwDduAks5qqj9lgaJpZM4J-UJq .

Jennifer Solomon GOKb Editor, Acquisitions and Discovery North Carolina State University Libraries 919-515-2743 j kristen_wilson@ncsu.eduhsolomo@ncsu.edu

ianibo commented 8 years ago

Housekeeping should auto-resolve any review requests when it cleans up this situation.

jhsolomon commented 8 years ago

fix confirmed