Closed peetucket closed 6 years ago
Another example:
E, [2018-04-19T10:37:22.796314 #14635] ERROR -- : #<ActiveRecord::RecordNotUnique: Mysql2::Error: Duplicate entry '450125' for key 'index_web_of_science_source_records_on_publication_id': UPDATE `web_of_science_source_records` SET `publication_id` = 450125, `updated_at` = '2018-04-19 17:37:22' WHERE `web_of_science_source_records`.`id` = 2262>
E, [2018-04-19T10:37:22.796535 #14635] ERROR -- : /opt/app/pub/sul-pub/shared/bundle/ruby/2.3.0/gems/mysql2-0.4.10/lib/mysql2/client.rb:120:in `_query'
So in the latter example, the two WOS source records competing to represent the same pub are:
2131
w/ uid: "MEDLINE:28847984"
, has pub 450125
2262
w/ uid: "WOS:000408820300034"
, raised error trying to link to same pubSame DOI, but different UID so different fingerprint. Upstream provider is treating these as separate records.
The pub being linked isn't actually WoS provenance: it's a pubmed record from 30 Aug 2017
. So we really only need at most one WoS record to link to it (which we have).
So I interpret this as a question of duplicate detection or just handling the collision error.
Ok, so the WoS API returns this record twice as you note, but with different UIDs (since it comes from two different databases on their end). Ultimately as long as we don't duplicate the publication row itself, the fact that we have two different WOS Source Records is not a big deal (one is just orphaned). So perhaps just a guard clause around the linking logic in the "link_publication" method of WebOfScienceSourceRecord?
Errors like this keep showing up in the logs during harvesting. We should understand why to see if any mitigation is necessary: