internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.11k stars 1.34k forks source link

Import BWB ids for pre-isbn books #3320

Closed cdrini closed 1 year ago

cdrini commented 4 years ago

I added a better_world_books field for ID numbers: E.g. https://openlibrary.org/books/OL27291054M/Dwellers_in_the_Mirage . We should import BWB IDs for pre-isbn books so that the price shows up and so that these can be sponsorable.

Proposal & Constraints

Additional context

Stakeholders

@hornc @mekarpeles

xayhewalo commented 4 years ago

@cdrini I'd like to tackle this.

I presume the first step would be filtering the most recent data dump for editions with no ISBN using grep.

After that I'm not sure what's the best way to know if an edition has a BWB id

cdrini commented 4 years ago

Sounds good to me! We might be able to get a dump from BWB with that data; @hornc do you by any chance have that or know how we can get it?

LeadSongDog commented 4 years ago

Quick screen: pre 1966 can’t have an SBN. 1966-1970 ISBN-10 starts with 0 if any. 1970 onward most will have one.

mekarpeles commented 4 years ago

I don't want to be too contentious, but I might even suggest us closing this one. There may be ~1M books they've seen with either No-ID or pre-isbn. The metadata will likely be sparse. Often times there is no just ID evident on the book (e.g. no barcode, etc); it's not that it's pre-isbn. @LeadSongDog's comment is well informed. I don't know that this direction has the ROI to merit this specific vector for imports, versus e.g. Jude has 5M+ MARC records from institutional partners sitting on a VM somewhere which would be much higher ROI and include pre-isbn books. Leaving it to @cdrini

tfmorris commented 1 year ago

I support @mekarpeles recommendation to close this.