internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.14k stars 1.35k forks source link

Bot suggestions #2387

Closed BrittanyBunk closed 4 years ago

BrittanyBunk commented 5 years ago

From the Internet Archive:

From Book Finder (BF):

cdrini commented 5 years ago

@hornc @mekarpeles Should we create issues for these on the bot repo, or is here ok?

LeadSongDog commented 5 years ago

Wherever the issue is done, the bot should also verify there's a real, published edition, not just a number. That means checking for availability in stores and/or libraries.

BrittanyBunk commented 5 years ago

I'm not sure that's possible for the bot to do. I mean, I bet the bot can do 90% of the groundwork and people can double check in stores/libraries for the other 10%

seabelis commented 5 years ago

Agreed with @LeadSongDog . We don't want to import phantom editions; same as has been discussed with Amazon imports.

BrittanyBunk commented 5 years ago

Idk what a phantom edition is - like how they exist.

LeadSongDog commented 5 years ago

@BrittanyBunk prospective publishers don't get individual ISBNs, they buy whole blocks of them (100, 1000, 10 000, etc.) from national registrars such as Bowker. Before they start to market a book they assign it an ISBN from their block. They'll at some point submit bibliographic data for cataloguing under that number and set a nominal publication date. If for some reason the book never actually publishes, the ISBN is for a "phantom" edition. The number should never be reassigned to a different book, even though that does sometimes happen at small publishers. We do not want these to show up in OL, as nobody can ever obtain and read them.

This is particularly a problem with a large number of Amazon imports from 2008, where bogus records described books that were clearly never published.

There are a great many useful search engines linked from the "find this book" links at https://en.wikipedia.org/wiki/Special:BookSources/978-1-558-61138-2 (or whatever ISBN)

If these can't match the ISBN anywhere, the book probably doesn't exist. In fact, just a KVK search will usually find a catalogue that does have it.

BrittanyBunk commented 5 years ago

I get it now. I think it'll be ok to use bookfinder to find the missing complementary isbn for books that already exist on Open Library and then have it double check afterwards on the wikipedia site you brought up.

mekarpeles commented 4 years ago

I don't have the time to create new issues for these specifically -- if the proposal is for a description bot and a table of contents bot, let's document them clearly as separate issues please :)

BrittanyBunk commented 4 years ago

@tfmorris I'll delete then. That's what I figured.