lewisdonovan / google-news-scraper

Lightweight scraper for Google News
Other
243 stars 62 forks source link

ERROR REJECTING COOKIES: Error: No element found for selector: [aria-label="Reject all"] #38

Closed mashegoindustries closed 10 months ago

mashegoindustries commented 11 months ago

Hi,

In production, I seem to be getting the below all the time, when calling the API. All the news gets returned as expected. No issues.

Do I need to worry about the below, or perhaps I should just ignore it?

DEFAULT 2023-12-31T17:39:19.652014Z SCRAPING NEWS FROM: https://news.google.com/search?hl=en-US&gl=US&ceid=US:en&q=mamelodi sundowns when:5d DEFAULT 2023-12-31T17:39:33.402470Z ERROR REJECTING COOKIES: Error: No element found for selector: [aria-label="Reject all"] DEFAULT 2023-12-31T17:39:33.402493Z at assert (/workspace/node_modules/google-news-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15) DEFAULT 2023-12-31T17:39:33.402499Z at DOMWorld.click (/workspace/node_modules/google-news-scraper/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:455:32) DEFAULT 2023-12-31T17:39:33.402503Z at process.processTicksAndRejections (node:internal/process/task_queues:95:5) DEFAULT 2023-12-31T17:39:33.402508Z at async Promise.all (index 0) DEFAULT 2023-12-31T17:39:33.402512Z at async googleNewsScraper (/workspace/node_modules/google-news-scraper/index.js:47:9) DEFAULT 2023-12-31T17:39:33.402518Z at async generateGoogleNews (/workspace/lib/cron_jobs/google_news_scrapper/load_google_news_scraper.js:51:39) DEFAULT 2023-12-31T17:39:33.402524Z at async loadGoogleNewsScraperFunction (/workspace/lib/cron_jobs/google_news_scrapper/load_google_news_scraper.js:12:9) DEFAULT 2023-12-31T17:39:33.402530Z at async httpFunc (/workspace/node_modules/firebase-functions/lib/v2/providers/scheduler.js:67:13)

lewisdonovan commented 10 months ago

@mashegoindustries thanks for flagging, this was some console output added during testing and shouldn't affect results. If you're getting the expected JSON data afterwards then this message doesn't matter.

For context, Google added a new interstitial cookie page which sometimes shows and sometimes doesn't, so there's a try/catch block that looks for the "Reject cookies" button and clicks it if it exists. This console logging simply means the interstitial page wasn't shown, so the "Reject cookies" button wasn't found in the DOM.

I'll remove the console logging in the next release, as it was only ever intended for debug.

mashegoindustries commented 10 months ago

Awesome. Makes sense now ;)