maria-antoniak / goodreads-scraper

A Python scraper for Goodreads books and reviews.
GNU General Public License v3.0
274 stars 83 forks source link

Get ids #25

Closed dddjjjbbb closed 3 years ago

dddjjjbbb commented 3 years ago

Description

Please note a summary of the changes.

I introduced a get_book_ids module for non-goodreads users to pull required ids from goodreads.

Please provide your motivation and context.

I don't use goodreads. When I come across an interesting book I just note it down in a plain txt file {title} - {author}. Perhaps I'm one of the few still doing this, who knows? :) That said, much of the data available to goodreads is interesting to me.

In short, I had a list, but no goodreads account, so thus, no ids I could use as input to gather more information on those books. For those in the same boat 🚣, this is for you :)

List any dependencies that are required for this change.

Production

Dev

Does this PR fix any existing issue?

Yup, this will hopefully make it easier for non-goodreads users to access the ids required to pull additional information from their to be read lists.

Type of change

How Has This Been Tested?

Please see the unit tests attached to the PR. I strove for 100% test coverage where possible. Tests are organised by module.

In addition, integration tests were run against a list of 1000 books. This uncovered numerous issues relating to timeouts and errors on parse. Each of those is now handled in the PR.

In these tests I tried to maintain a naming schema. I hope it helps out whomever ends up working on this next.

Schema

test_{it or ut}_{method name}_should_return_{object or return state}_where_{condition}

Example:

test_it_build_query_model_should_return_a_query_model_where_book_title_contains_a_subtitle

Test Configuration:

Run tests with pytest from root

Checklist:

maria-antoniak commented 3 years ago

Could you rename the module and related to code to make it clear which IDs are being gathered? "IDs" could refer to books, users, etc.

dddjjjbbb commented 3 years ago

Thanks for this @maria-antoniak, good suggestion, it's definitely ambiguous. Hopefully get_book_ids will avoid any confusion :)

Please see my updated PR with those changes here: https://github.com/maria-antoniak/goodreads-scraper/pull/25/commits/eec656a4f9a9035767cbddb40280abfe7bad6650

dddjjjbbb commented 3 years ago

HI @maria-antoniak, just checking in, are you planning on merging this PR? Thanks :)

maria-antoniak commented 3 years ago

Just for documentation: we've talked offline and this PR is going to become a separate project as the changes were too large for our project.