maria-antoniak / goodreads-scraper

A Python scraper for Goodreads books and reviews.
GNU General Public License v3.0
264 stars 82 forks source link

HTTP Error 504: Gateway Time-out #26

Closed fzanartu closed 2 years ago

fzanartu commented 3 years ago

Hi! Nice code. It has been very useful to me. I'm building a dataset to feed an ML model for an applied ML course at my Uni. I'm trying to scrape the data of 1023 books, but after the 22nd I've got HTTP Error 504: Gateway Time-out. Any suggestion about how to follow through?

fzanartu commented 3 years ago

I just noticed that the code resume where it left, nice feature! I changed time.sleep(2) for time.sleep(randint(2,20)), maybe that could give more time to Goodreads server handling my requests. I'll tell you how it goes.

ghost commented 3 years ago

Hi how it goes ??

fzanartu commented 3 years ago

It did better, although I'm still getting caught by Goodread's servers. Maybe I need to increase the time range a bit further. Do you have any other suggestion?

maria-antoniak commented 3 years ago

I don't think the timing is the issue, and I recommend checking the page of the book you're trying to scrape and seeing if anything is amiss. Unfortunately we can't troubleshoot each case, but I hope you figure it out!