internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.12k stars 1.34k forks source link

Run Standard Ebooks import #9372

Open cdrini opened 4 months ago

cdrini commented 4 months ago

Standard Ebooks recently celebrated their 1,000th ebook! We only have 491, so it's time to re-run the import!

It seems like they've authenticated access to their OPDS feed, although with exceptions for open source organization; we should contact them.

Once we get access, it's pretty easy we just run the scripts/import_standard_ebooks.py on ol-home0 container as non-root.

Stakeholder

@mekarpeles @jimchamp

cdrini commented 4 months ago

I was able to run the import, but importbot was misbehaving and attaching the IDs to the wrong book! I caught it after a few edits and turned of importbot. The rest of the imports are blocked on #9387

cdrini commented 3 months ago

Ran the import, we're now up to 839! No clue why we haven't hit 1000 though :/ https://openlibrary.org/search?q=id_standard_ebooks%3A*&mode=everything

cdrini commented 3 months ago

Found out; when we ran this initially in 2022, because of #9387 , they were imported to the wrong editions, and the identifiers were entirely ignored -_- sigh. Ran a bulk edit to roll back those specific edits: https://openlibrary.org/recentchanges/2024/06/13/bulk_update/131998751

Now to try to figure out how to re-import these...