pevers / images-scraper

Simple and fast scraper for Google
ISC License
224 stars 69 forks source link

Scrape mehod always returns empty array #113

Open alperenbaskaya58 opened 2 months ago

alperenbaskaya58 commented 2 months ago

I was using this package to grab image links. Even the sample code is not working returns empty array. What could be the problem?

var Scraper = require('images-scraper');

const google = new Scraper({
  puppeteer: {
    headless: false,
  },
  //safe : false,
  //userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)', // the user agent

});

(async () => {

  const results = await google.scrape('banana', 5);
  console.log('results', results);
})();
TrifiAmanallah commented 2 months ago

Html selectors have to be updated after new google changes.

kanjieater commented 3 weeks ago

@pevers Any way you could update this for us?

Any work arounds or forks in the mean time?

pevers commented 3 weeks ago

Hey @kanjieater , I haven't looked into it yet as I was busy with other things. But I'll try to have a look soon. In the meantime, if there is anyone with a solution I would be glad to merge it!

gojodev commented 1 week ago

This happened to me as well, so instead I just used puppeteer from node

This is what I currently have:

import puppeteer from 'puppeteer';

async function imgScrape(queries) {
    try {
        const browser = await puppeteer.launch({ headless: true });
        const page = await browser.newPage();
        var images;
        for (const query of queries) {
            await page.goto(`https://www.google.com/search?tbm=isch&q=${query}`);

            // Scroll to the bottom of the page to load more images
            await page.evaluate(async () => {
                for (let i = 0; i < 10; i++) {
                    window.scrollBy(0, window.innerHeight);
                    await new Promise(resolve => setTimeout(resolve, 500)); // Wait for more images to load
                }
            });

            // Wait for images to be loaded
            await page.waitForSelector('img');

            // Extract image URLs
            images = await page.evaluate(() => {
                const imageElements = document.querySelectorAll('img');
                const urls = [];
                imageElements.forEach(img => {
                    const url = img.src;
                    if (url.startsWith('http') && !url.includes('google')) {
                        urls.push(url);
                    }
                });
                return urls.slice(0, 3); // Limit to first 3 image URLs
            });
        }

        await browser.close();
        return images;

    } catch (err) {
        console.error('An error occurred:', err);
    }
}

const urls = Promise.resolve(imgScrape(['python programming language logo']));

urls.then((urls) => {
    console.log(urls);
})

TLDR; don't use "images-scraper" and use "puppeteer"

skeddles commented 5 hours ago

is there a way to get the puppeteer to stay open? it opens but then immediately closes so im having trouble debugging