Open mejackreed opened 8 years ago
From @peetucket on April 13, 2016 23:10
Example of a duplicate publication in production:
cap_profile_id='45761'
two identical publications (except for "." at the end of one title): "Insights on the marine microbial nitrogen cycle from isotopic approaches to nitrification." and "Insights on the marine microbial nitrogen cycle from isotopic approaches to nitrification"
publication id = 239481 (sciencewireID=61620564,pmid=23091468) publication id = 308362 (sciencewireID=65369473,pmid=blank)
Similar work on authorship duplicates was done in
This should be cleaned up by rebuilding pub_hashes after cleaning up the pub identifiers table (work in #285)
Yes, although the work in #285 and similar identifiers work in this sprint is focused only on removing empty stuff, discarding invalid stuff and normalizing the rest of it. In other words, that work will only touch a subset of the PublicationIdentifiers (and some of that work has not updated the associated Publication.pub_hash data). This issue is about inspecting the entire set of Publications; it's best to do it after the cleanup tasks.
From @darrenleeweber on February 5, 2016 20:41
The sul-cap-dev platform has some publication data with duplicate publication identifiers in the pub_hash, e.g.
Copied from original issue: sul-dlss/sul-pub#46