Infinite loop when setting a limit over 150 images

pevers / images-scraper

Simple and fast scraper for Google

ISC License

224 stars 69 forks source link

Infinite loop when setting a limit over 150 images #44

Closed OlivierDa closed 3 years ago

OlivierDa commented 4 years ago

Hi, I just ran in a bug, it seems that the scrapper goes crazy when changing pages. I added some logs and monitored the results and it seems the results array in the while loop is doing something like :

0 results
50 results
100 results
150 results
100 results
150 results etc... The limit is never reached and the while is never exited... Maybe a bug or a missing test (if the limit is over the count of avaibable results?)

let SEARCH_SCRAPPER = new Scraper({ puppeteer: { headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox'] } }); let res = await SEARCH_SCRAPPER.scrape(patternSearch, 250);

pevers commented 4 years ago

I can't seem to reproduce it. Can you see what happens visually with the option headless: false? I check for the button "Looks like you've reached the end" to see if the end is reached. I suspect that it might be a caused by an internationalization difference.

OlivierDa commented 4 years ago

Unfortunately, i'm using the scrapper in a node js REST API on a remote debian server. So i can't have a visual verification... I'll try to reproduce the problem later!

steven-tel commented 4 years ago

Hi, I get the same issue when I try to get 800 images, the script try too scroll down to get more results but google wont give more, so the script is stuck .

pevers commented 4 years ago

Hmm it might be a better idea to just check wether we were able to fetch more results. I think it is flaky because of localization.

So I created this PR: https://github.com/pevers/images-scraper/pull/48/files

Hopefully this fixes it for you.

steven-tel commented 4 years ago

Thanks, for me it's working well now !

ctzntx commented 4 years ago

I can also confirm that trying to get larger number of images fails. Running the example from README with limit set to 1000, the results always contains a smaller number of items (300, 400, depending on the run).

pevers commented 3 years ago

I'm closing this one. If anyone is still having issues please let me know.