ulixee / secret-agent

The web scraper that's nearly impossible to block - now called @ulixee/hero
https://secretagent.dev
MIT License
667 stars 44 forks source link

Question networkidle2 #345

Open ctaity opened 3 years ago

ctaity commented 3 years ago

Exists some event to wait for the page to load completely like networkidle2 in puppeteer? Thanks.

I am using: await activeTab.waitForLoad(LocationStatus.AllContentLoaded); is ok ?

please fix client plugins, I like to do some methods for easy scraping like

agent.waitForAllLoad()

Thanks to all guys.

blakebyrnes commented 3 years ago

Your code will wait for the "load" event that triggers from the browser. That should work fine if that's what you want.

We don't have anything like networkidle2 yet. I think @calebjclark had some ideas, but we're currently working on other ideas for detecting load state.

Can you share the use case you have for networkidle2? Do you not have any idea what elements are on the page once everything is loaded?

ctaity commented 3 years ago

Yes but not always the element is loaded or created, for example, an element created by an ajax call or something like that. For example tiktok search page has 4 versions, and secret agent wait ccs3 selectors in serial manner if i wait 5 seconds , takes 20 seconds to load de the page. With network idle, when de page is loaded i use the innerhtml and search in sync way the selectors, with this way it tooks 5 seconds.

The function i made to wait a selector is this:

const waitForSelector = async (
  agent: Agent,
  selector: string,
  visible: boolean,
  timeout: number,
): Promise<any> => {
  try {
    console.log('waiting for element:' + selector);
    await agent.waitForElement(agent.document.querySelector(selector), {
      timeoutMs: timeout,
      waitForVisible: visible,
    });
    return agent.document.querySelector(selector);
  } catch (e) {
    console.log('TIMEOUT FOR SELECTOR:' + selector);
    return null;
  }
};
blakebyrnes commented 3 years ago

This makes sense. We're just wrapping up a feature in Hero (next iteration of SecretAgent) that I think should help with this scenario (https://github.com/ulixee/hero/pull/26). We're not quite ready for people to switch over, but might give you a taste of what's coming!

andynuss commented 2 years ago

I was myself looking for something along the lines of playwright's wait for 'networkidle'. It appears that the latest release version does not include the function waitForPageState, neither in my compiled node_modules, nor the sources. Is it scheduled to be available in a future release?