pevers / images-scraper

Simple and fast scraper for Google
ISC License
224 stars 69 forks source link

I set 300 but scraper is only getting 200 images. #46

Closed ghost closed 4 years ago

ghost commented 4 years ago
var gi = new GIScraper({ puppeteer: { headless: false } });
let results = await gi.scrape("melony pokemon", 300);
console.log(`${results.length} results`);

It's always 200.

diornister commented 4 years ago

basically on file images-scraper/src/google/scraper.js, line 64 there is a mistake, results = links.slice(0, limit - results.length); should be results = links.slice(0, limit);

if the owner give me permission i could make a pr to fix this.

ghost commented 4 years ago

Can't anyone open pull requests?

diornister commented 4 years ago

Can't anyone open pull requests?

he needs to give permission for this.

diornister commented 4 years ago

But as a "fix", you can set your limit to double of what you want, and it will fetch you more images.

ghost commented 4 years ago

https://github.com/pevers/images-scraper/blob/8cef8c8b5f3459eda60558080515df84803d4143/src/google/scraper.js#L56-L71

It looks to me like results = links.slice(0, limit - results.length); should be results.concat(links); and results = results.slice(0, limit - results.length); should be added after the while loop.

ghost commented 4 years ago

Or wait no, HTML will have all the images in it. 🤔

ghost commented 4 years ago

Ah yes I think you're right; limit - results.length would work if the number of results is greater than the limit, then it gets a negative number to slice back from. But when it's less (say 100 results) and limit is 300... wait then it won't do anything... ~~Hmm I think either way it'll do the same thing; both .slice(0, 300 - 100 ) and .slice(0, 300) will not change the 100 long array, and .slice(0, 300 - 400) will do the same as .slice(0, 300) for a 400 long array. I think the issue is somewhere else, maybe the thing that decides when the end has been reached.~~ But wait if limit is 300 and it got 200 images, 300 - 200 = 100 so it will chop it down back to 100. So I guess when the number of images collected is more than half the limit, weird stuff starts happening. So yes that must be the fix.

pevers commented 4 years ago

Everyone should be able to fork it and create a pull request.

But there is indeed a mistake here @ledlamp . Sorry, I have fixed it here: https://github.com/pevers/images-scraper/pull/47