This PR removes duplicate records for numerous NIST publications from allrecords.xml. The process used was:
Generate a list of duplicated DOIs
Open the allrecords.xml file
For each duplicated DOI:
Find the first instance of the DOI
Remove the record (everything in the <query>...</query> block)
Search for the DOI to ensure it is present in a "later" query block
Save the file
Commit the changes for the specific DOI ("atomic" commits)
Loop until all duplicate records have been removed
This workflow should have preserved the latest version of each record matching a duplicated DOI. If there are errors, the atomic commits should make it easy to revert a specific change. The downside of this is the large number of commits associated with this PR. For that reason, if this PR is accepted, please use a squash merge to combine the atomic changes into a single commit representing all the changes.
This PR removes duplicate records for numerous NIST publications from
allrecords.xml
. The process used was:allrecords.xml
file<query>...</query>
block)query
blockThis workflow should have preserved the latest version of each record matching a duplicated DOI. If there are errors, the atomic commits should make it easy to revert a specific change. The downside of this is the large number of commits associated with this PR. For that reason, if this PR is accepted, please use a squash merge to combine the atomic changes into a single commit representing all the changes.