ulixee / hero

The web browser built for scraping
MIT License
649 stars 32 forks source link

waitForState - timeoutMs doesn't work #214

Closed b4shx0r closed 1 year ago

b4shx0r commented 1 year ago

Hi,

using the waitForState() function for waiting for the correct logo in a page. had to resolve some captcha's before with a flow handler.

but the timeoutMs options doesn't affect the wait time.

we always get the default wait timeout and the process aborts with TimeoutError: Timeout waiting for DomState

error occurs with the following code:

await hero.waitForState({ name: "logoLoaded", options: { timeoutMs: 600000 }, all(assert) { assert(hero.document.querySelector("img.brand-logo").$isVisible); }, });

whats wrong with this?

using hero 2.0.0-alpha.18

blakebyrnes commented 1 year ago

Thanks for reporting. Do you have a simple example you can share (you're welcome to DM on discord if you'd prefer)

blakebyrnes commented 1 year ago

I think I actually see what's going on. The options currently needs to be the second argument:

await hero.waitForState({ 
   name: "logoLoaded", 
   all(assert) { 
      assert(hero.document.querySelector("img.brand-logo").$isVisible); }, 
   },
},  { timeoutMs: 600000 });
b4shx0r commented 1 year ago

Thank you! - works for the total timeout before the 'Timeout waiting for DomState' is triggered.

But now the flow handler is triggered after that timeout is over.

The docs says a flow handler is triggerd if any query selector failed. So should the flow handler not be triggered for the first time the querySelector inside the wait fails? So we can handle solving the captcha challenge with the handler and then die resource wait for is available.

Currently the follow occurs:

WaitForState -> wait for the timeout defined -> Flow handler is triggered -> Code processed the captcha -> After that waitForState throws 'Timeout waiting for DomState'

btw: great tool anyway ;)

blakebyrnes commented 1 year ago

Glad that helped. Thanks!

Can you give me some pseudo code or a more full example? I'm having a somewhat hard time following. Is the brand-logo visible after the capcha is resolved, but still timing out?

b4shx0r commented 1 year ago

here a example:

exports.loadPage = async function (hero) { 
    await hero.registerFlowHandler(
      "SolveCaptcha",
      (assert) => {
        assert(
          hero
            .getFrameEnvironment(hero.document.querySelector("#main-iframe"))
            .querySelector("div.g-recaptcha").$isVisible
        );
      },
      async (error) => {
        console.log("found blockpage - start solving captcha");
        await solveCaptcha(hero);
      }
    );

    await hero.goto(location);
    await hero.activeTab.waitForPaintingStable();

    await hero.waitForState(
      {
        name: "logoLoaded",
        all(assert) {
          assert(hero.document.querySelector("img.brand-logo").$isVisible);
        },
      },
      { timeoutMs: 10000 }
    );

    console.log("brand logo found!");
    //do somthing...
}

I dont reach the console output cause everytime the Timeout waiting for DomState' is fired after flow handler finished captcha solving.

blakebyrnes commented 1 year ago

Ok, I understand the scenario now. Do you know for sure that hero.document.querySelector("img.brand-logo").$isVisible is true afterwards?

FWIW, you could simplify this code to simply


    await hero.registerFlowHandler(
      "SolveCaptcha",
      (assert) => {
        assert(
          hero
            .getFrameEnvironment(hero.document.querySelector("#main-iframe"))
            .querySelector("div.g-recaptcha").$isVisible
        );
      },
      async (error) => {
        console.log("found blockpage - start solving captcha");
        await solveCaptcha(hero);
      }
    );

    await hero.goto(location);
    await hero.activeTab.waitForPaintingStable();

    await hero.document.querySelector("img.brand-logo").$waitForVisible({ timeoutMs: 10000 });

    console.log("brand logo found!");
    //do somthing...
}```
b4shx0r commented 1 year ago

hmm unfortunately this don't work for me.

the example above wait 10 secs then triggers the flow handler. the handler tooks maybe 100 seconds to solve.

after that its aborts with the DomState timeout exception.

is the timeoutMs parameter for the initial wait for element to be visible and THEN trigger the flow handler or the complete timout waiting for. then maybe the trigger mechanism for the flow handler doesn't work when the querySelector doesn't find the element for the logo.

thanks anyway ;)

blakebyrnes commented 1 year ago

I didn't change any of your code ;) Just showing a simpler way to call your waitForState. The main purpose of my comment was to ask if the logo should be visible once your FlowHandler has completed.

The timeout value is how long it will always wait for that brand logo to be visible when it hits that part of your code. So it will encounter that condition, wait for the brand logo to be visible, and then if it can't see it, it will timeout and trigger your FlowHandler. Once your flow handler completes, it will retry the img.brand-logo query.

b4shx0r commented 1 year ago

it will timeout and trigger your FlowHandler. Once your flow handler completes, it will retry the img.brand-logo query.

so thats exactly how i understand it should work but should I catch the Timeout and retry the query myself? this automation 'it will retry checking for the img.brand-logo' seems not to work for me.

blakebyrnes commented 1 year ago

Gotcha. Maybe you're hitting a bug. Could you send me a session database showing it failing? That would be helpful.

blakebyrnes commented 1 year ago

(You should not need to catch and retry yourself). You can manually trigger the flow handler if you don't want to wait for that first 10 seconds using triggerFlowHandlers (https://ulixee.org/docs/hero/advanced-client/tab#trigger-flow-handler)

b4shx0r commented 1 year ago

ah cool - could you send me a link for details how to get the session database? couldn't find in docs search

blakebyrnes commented 1 year ago

Sure: https://ulixee.org/docs/hero/advanced-concepts/sessions

b4shx0r commented 1 year ago

Good news - seems a initial call to await hero.triggerFlowHandlers() fixed my issue.

thank you ;)

blakebyrnes commented 1 year ago

Great. If you think it's still worth investigating the timeout issue, send me that db. Otherwise, feel free to close this one.