leoncvlt / blinkist-scraper

📚 Python tool to download book summaries and audio from Blinkist.com, and generate some pretty output
190 stars 36 forks source link

fixed #40 and improvmented uncategorized books handling #41

Closed FirstClassCitizenFCC closed 3 years ago

FirstClassCitizenFCC commented 3 years ago

Fixed #40 by always making a request to the book URL to ensure we get the necessary request header for the audio endpoint. I don't know how one can get the header at this point if we are already on the book URL without making the request or if this is even possible.

I think _get_allbooks for uncategorized books should be done at the end in case the process ends prematurely. Occasionally I ran into a captcha after scraping the sitemap. By moving _get_allbooks to the end this should get avoided for each incomplete run. Also I consider the --match-language flag to get all books if desired.

leoncvlt commented 3 years ago

Cheers!