Closed laserman120 closed 1 year ago
happened to me too. The accept cookies page triggers this const isScrollable = await this._isScrollable(page); if (!isScrollable) { console.log('No results on this page'); return; }
So after a bit of reading about puppeteer i wrote this together:
//Google Accept All
//Search for the button with the text "Accept all"
const [button] = await page.$x("//button[contains(., 'Accept all')]");
if (button) {
//Press the button and wait till the page finishes loading
await button.click();
await page.waitForNavigation({
waitUntil: 'networkidle0',
});
}
I put that into the node_modules\images-scraper\src\google\scraper.js
file right below:
const page = await browser.newPage();
await page.setBypassCSP(true);
await page.goto(query, {
waitUntil: 'networkidle0',
});
This certainly isnt the fastest approach but with my limited knowledge it is at least a temporary fix for anyone who has the same issue. This will increase the time it takes to start the search, but i am not sure if another approach would make it significantly faster due to the page needing to load twice now.
If the dev sees it i also added a pull request with the quick fix, so it at least works till someone makes a better version of the fix.
Another approach would be to use the code i provided above to accept the cookies. You could then use puppeteer to store the cookies locally and then retrieve those on the next session to skip the "accept cookies" page. This would probably result in faster results as it wont load the page twice like in the current fix.
This solution would require fs as it needs to store it locally. Here is my attempt: Starting with line 45
const browser = await puppeteer.launch({
...this.puppeteerOptions,
});
const page = await browser.newPage();
await page.setBypassCSP(true);
//Load cookies
if (fs.existsSync('./node_modules/images-scraper/src/google/cookies.json')) {
const cookies = fs.readFileSync('./node_modules/images-scraper/src/google/cookies.json', 'utf8')
const deserializedCookies = JSON.parse(cookies)
await page.setCookie(...deserializedCookies)
};
// Cookies retrieved
await page.goto(query, {
waitUntil: 'networkidle0',
});
//Google Accept All
//Search for the button with the text "Accept all"
const [button] = await page.$x("//button[contains(., 'Accept all')]");
if (button) {
//Press the button and wait till the page finishes loading
await button.click();
await page.waitForNavigation({
waitUntil: 'networkidle0',
});
//Set cookies
const cookies = await page.cookies()
const cookieJson = JSON.stringify(cookies)
fs.writeFileSync('./node_modules/images-scraper/src/google/cookies.json', cookieJson)
}
//back to the rest of the code
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent(this.userAgent);
From rough tests i made on my side it improves the search time by about 1.5 seconds when it searches for 25 images. (Tests were conducted with headless false to check what is actually going on)
As this approach would need fs i wont make a pull request for now but it seems to be the faster solution even if we have to store and retrieve a file locally.
@pevers would you mind looking over this issue? As this might be caused by google it could have lead to the scraper no longer working for anyone.
Another approach would be to use the code i provided above to accept the cookies. You could then use puppeteer to store the cookies locally and then retrieve those on the next session to skip the "accept cookies" page. This would probably result in faster results as it wont load the page twice like in the current fix.
This solution would require fs as it needs to store it locally. Here is my attempt: Starting with line 45
const browser = await puppeteer.launch({ ...this.puppeteerOptions, }); const page = await browser.newPage(); await page.setBypassCSP(true); //Load cookies if (fs.existsSync('./node_modules/images-scraper/src/google/cookies.json')) { const cookies = fs.readFileSync('./node_modules/images-scraper/src/google/cookies.json', 'utf8') const deserializedCookies = JSON.parse(cookies) await page.setCookie(...deserializedCookies) }; // Cookies retrieved await page.goto(query, { waitUntil: 'networkidle0', }); //Google Accept All //Search for the button with the text "Accept all" const [button] = await page.$x("//button[contains(., 'Accept all')]"); if (button) { //Press the button and wait till the page finishes loading await button.click(); await page.waitForNavigation({ waitUntil: 'networkidle0', }); //Set cookies const cookies = await page.cookies() const cookieJson = JSON.stringify(cookies) fs.writeFileSync('./node_modules/images-scraper/src/google/cookies.json', cookieJson) } //back to the rest of the code await page.setViewport({ width: 1920, height: 1080 }); await page.setUserAgent(this.userAgent);
From rough tests i made on my side it improves the search time by about 1.5 seconds when it searches for 25 images. (Tests were conducted with headless false to check what is actually going on)
As this approach would need fs i wont make a pull request for now but it seems to be the faster solution even if we have to store and retrieve a file locally.
@pevers would you mind looking over this issue? As this might be caused by google it could have lead to the scraper no longer working for anyone.
Thanks for looking into this and the fix! I'll have a look tonight and test it.
This should now be fixed in: https://github.com/pevers/images-scraper/pull/96
Thanks for the fix!
When starting the search, it starts up chrome, but gets stuck on the "Accept cookies page" in google, it then resizes and closes as it cannot find any images.
https://gyazo.com/13510e16f0993ee936bbe316b9cb08b4
Here is an example of starting the chrome instance and getting stuck