ulixee / hero

The web browser built for scraping
MIT License
668 stars 32 forks source link

Allow scrolling/clicking inside scrollable divs (edited title) #160

Open fcpunk opened 2 years ago

fcpunk commented 2 years ago

Hi,

I'm trying to execute a simple script:

await agent.goto('https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword=backend+developer&sc.keyword=backend+developer&locT=N&locId=96&fromAge=3&countryRedirect=false&jobType=');
await agent.waitForPaintingStable();
const nextPageButton = await agent.document.querySelector('a[data-test="pagination-next"]');
await agent.interact({ scroll: nextPageButton });
await agent.click(nextPageButton);

and for some reason, no scrolling happens. Is it a bug or I'm missing something?

Also, I noticed that await agent.waitForPaintingStable(); resolves too early.

blakebyrnes commented 2 years ago

Thanks for logging this. On first glance, it looks like we don't know how to properly scroll within a scrollable box inside the page (ie, not scrolling a frame, or the whole page).

waitForPaintingStable will trigger based on the Chrome lighthouse performance metric - which determines that the largest content above the fold has rendered. What are you seeing that looks off?

fcpunk commented 2 years ago

Thanks for logging this. On first glance, it looks like we don't know how to properly scroll within a scrollable box inside the page (ie, not scrolling a frame, or the whole page).

Would you suggest a possible workaround? Bunch of job boards like this :/

waitForPaintingStable will trigger based on the Chrome lighthouse performance metric - which determines that the largest content above the fold has rendered. What are you seeing that looks off?

When I reply session in SecretAgent (amazing feature, thumbs up, and respect for this!) I see that only the left side of the page is actually loaded when await agent.waitForPaintingStable(); resolves. The right side loads in a couple of seconds after that and it takes about 4-5 seconds more for the entire page to complete rendering (this is when "accept cookies" banner appears at the bottom).

blakebyrnes commented 2 years ago

SecretAgent and Replay are way-off on this right now. You can try out the built-in commands:

  await nextPageButton.scrollIntoView();
  await nextPageButton.click();

The downside is they're detectable, but only if the site is looking.

fcpunk commented 2 years ago

SecretAgent and Replay are way-off on this right now. You can try out the built-in commands:

  await nextPageButton.scrollIntoView();
  await nextPageButton.click();

The downside is they're detectable, but only if the site is looking.

Thanks, I tried and failed.

Sample:

   for(let pageNumber = 2; pageNumber <= 5; pageNumber++) {
        agent.output = { currentPageNumber: pageNumber }};
        let nextPageButton = await agent.document.querySelector('a[data-test="pagination-next"]'); 
        await nextPageButton.scrollIntoView();
        await nextPageButton.click();
        await agent.waitForPaintingStable();
    }

2nd page loads fine but then for the rest of the iterations await agent.waitForPaintingStable() takes 2-3 ms to execute which is not right. Initially, I thought that it might be related to lastCommandId that waitForPaintingStable uses but scrollIntoView() and click() on element updated lastCommandId correctly.

blakebyrnes commented 1 year ago

This is a bug in the Unblocked/Agent project. It will involve determining if a containing scroll element needs to be scrolled vs the main page, and then creating a "phased" approach to scroll the cascade of scroll containers into view.

NOTE: this might need to also be applied to the Unblocked /unblocked/plugins/Default Human Emulator depending on implementation