Closed cdrini closed 1 year ago
I started on this, but @hornc pointed out that updating the source_records
field, as the script was doing, is a mistake, because:
source_records
set implies that a record was imported or re-imported from the archive.org MARC record or metadata; andHere that data is merely being linked after being imported from elsewhere, as these items were added to Open Library before they were scanned by Internet Archive.
One suggestion is to simply re-import the item to trigger the usual import process, which will do all the right things. For many items, this works well. For example, consider the diff for OL47730945M, where a cover and the number_of_pages
is filled in.
However, part of the reason some of these items are not linked is because the current importer rejects them.
Consider 007exoticlocatio0000arno/OL8582004M, which does appear to be a book:
❯ http POST https://openlibrary.org/api/import/ia identifier==007exoticlocatio0000arno require_marc==false bulk_marc==false Cookie:$OL_PROD_COOKIE
HTTP/1.1 400 Bad Request
[...]
{
"error": "Item rejected",
"error_code": "item-not-book",
"success": false
}
One strategy would be to go through the list, attempt to re-import all the items, and then track anything that replies with a 400
status code, examine those, and figure out how to best address them as a separate matter.
Thoughts?
@cdrini
Scott notes that the sync-up is complete ; we're now rerunning reconcile with Charles' suggestion to hit the import endpoint. This is working and adding the extra data Scott noted above. It's slower so will take ~7days. But since the OCAID sync is complete we can close this issue.
130,775 ocaids were synced! 🥳
You can monitor the new reconcile run here: https://openlibrary.org/people/scott365bot
Same as #7217
Q: Could we add a task to our Monthly Data Dumps cron to automate this?