lino-levan / astral

A high-level puppeteer/playwright-like library for Deno
https://jsr.io/@astral/astral
MIT License
215 stars 10 forks source link

`browser.newPage` sometimes fails to detect loaded pages #70

Open chaosharmonic opened 3 months ago

chaosharmonic commented 3 months ago

Opening a new page will occasionally throw errors, with the below output, despite the pages themselves successfully loading. Affected sites include LinkedIn, Indeed, and Glassdoor, and it appears to in some way relate to persistent network requests (to tracking APIs, for instance) on page load.

error: Uncaught (in promise) RetryError: Retrying exceeded the maxAttempts (5).
        throw new RetryError(error, options.maxAttempts);
              ^
    at retry (https://jsr.io/@std/async/0.223.0/retry.ts:143:15)
    at eventLoopTick (ext:core/01_core.js:207:9)
    at async Promise.all (index 0)
    at async Page.goto (https://jsr.io/@astral/astral/0.4.0/src/page.ts:521:5)
    at async Browser.newPage (https://jsr.io/@astral/astral/0.4.0/src/browser.ts:166:7)
    at async file:///home/casval/Dev/other-projects/careerCrawler/escapeHatch/scripts/getIndeedHiringPosts.js:39:14
Caused by: DeadlineError: Deadline
    at https://jsr.io/@std/async/0.223.0/deadline.ts:60:32
    at eventLoopTick (ext:core/01_core.js:207:9)
    at async retry (https://jsr.io/@std/async/0.223.0/retry.ts:140:14)
    at async Promise.all (index 0)
    at async Page.goto (https://jsr.io/@astral/astral/0.4.0/src/page.ts:521:5)
    at async Browser.newPage (https://jsr.io/@astral/astral/0.4.0/src/browser.ts:166:7)
    at async file:///home/casval/Dev/other-projects/careerCrawler/escapeHatch/scripts/getIndeedHiringPosts.js:39:14

This started after I switched to the version on JSR, and looks to be from a call to Celestial.network.enable({}) in browser.ts, introduced in c13b687, just after the last version on /x/. I haven't isolated why, but when I comment this out, it stops happening.

chaosharmonic commented 3 months ago

So on some further tinkering it looks to stem from page.goto and its default behavior of using networkidle2 as a waitFor option.

Setting waitFor to load or networkidle0 appears to resolve the issue, but I'm also not clear enough on how page.waitForNetworkIdle and its idle connection listening is supposed to work to be sure how much this is a bug or intended behavior that I should be passing in an override option for.