internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.23k stars 1.37k forks source link

ISBN search imports DVDs from Amazon.com #9879

Closed stopregionblocking closed 1 month ago

stopregionblocking commented 2 months ago

Problem

The feature which allows Amazon.com records to be imported into OpenLibrary via searching for ISBNs will, unfortunately, import DVDs with ISBNs from Amazon.com, even if they are clearly indicated as not being books.

This was confirmed using the example of https://www.amazon.com/gp/product/1621064298

Reproducing the bug

  1. Search OpenLibrary for the ISBN of a DVD available on Amazon.com, but not included on OpenLibrary.
  2. Search for the same ISBN again (to attempt an import of the record from Amazon.com).

Context

Breakdown

The solution will likely involve some minor modifications to https://github.com/internetarchive/openlibrary/blob/master/openlibrary/core/vendors.py so that DVDs don't return values to get_products().

For the above DVD/item, get_products() returns: [{'url': 'https://www.amazon.com/dp/1621064298/?tag=internetarchi-20', 'source_records': ['amazon:1621064298'], 'isbn_10': ['1621064298'], 'isbn_13': ['9781621064299'], 'price': '$15.19', 'price_amt': 1519, 'title': 'Homeland Insecurity: Films by Bill Brown', 'cover': 'https://m.media-amazon.com/images/I/41FuCUj3kUL._SL500_.jpg', 'authors': [{'name': 'Brown, Bill'}], 'publishers': ['Microcosm Publishing'], 'number_of_pages': None, 'edition_num': None, 'publish_date': 'Aug 01, 2007', 'product_group': 'DVD', 'physical_format': 'dvd'}] .

Of interest is product_group and physical_format. To complete this issue one would likely want to look at https://webservices.amazon.com/paapi5/documentation/ and determine why we should use product_group, physical_format, both, either, or something else to determine something is a DVD. Or maybe it's better to focus on what is allowed (e.g. books).

In any event, we'll likely want to want to modify serialize() or get_product() to filter out DVDs (or only allow whatever constitutes books, if the cases are clear).

Requirements Checklist

Related files

Stakeholders

*


Instructions for Contributors

krthkmgndm commented 1 month ago

Hi, I'm interested in working on this issue. Can someone give me some pointers on how to get started?

scottbarnes commented 1 month ago

Hi, @krthkmgndm, it sounds as if this issue may be a bit too daunting, as the current steps outlined are the best I can do, unfortunately. It may make more sense to try a Good First Issue: https://github.com/internetarchive/openlibrary/issues?q=is%3Aissue+is%3Aopen+label%3A%22Good+First+Issue%22+-linked%3Apr.

It's also worth checking out https://github.com/internetarchive/openlibrary/tree/master/docker#welcome-to-the-docker-installation-guide-for-open-library-developers.

DebbieSan commented 1 month ago

Hi @scottbarnes. I would like to work on this. I have an idea on how to tackle this issue :) ty!