ulixee / hero

The web browser built for scraping
MIT License
647 stars 32 forks source link

How to open a second tab? #268

Closed JortsEnjoyer0 closed 2 weeks ago

JortsEnjoyer0 commented 3 weeks ago

There appears to be no way to open a new tab through hero's api.

My workflow involves having one tab for google images search and another tab for sending the urls in each result through another website and scraping the results of that website's output.

If I'm not to create a second tab there, then what is the correct paradigm?

blakebyrnes commented 3 weeks ago

Hero was designed with the idea that every instance is a lightweight incognito window on top of shared chrome instances. We were imagining lambda for simple tasks, with an easy ability to shell off new activities. That vision has only been partially brought to life so far - the sessions are light and you can copy user profiles to other instances, resuming activities in a use case like yours.

You could grab the links in one session and create a PQueue of link processors to read it (kinda like this example: https://github.com/ulixee/hero/blob/fa241bd77fc182e576a49a416482dd003db2541e/examples/ulixee.org.ts)

That said, I think we probably are just overdue to add this feature. It's already in the backend, just not the client. Happy to take a PR if you're in dire need of it or feeling adventurous!

JortsEnjoyer0 commented 3 weeks ago

Prior code (originally using puppeteer) did scrape the links and create a queue to be processed by a later "pass" similar to how you described in your suggestion. The issue with that is feedback is needed by that first "google image" pass as to whether each url succeeded or failed to process by the second pass. There's a quota to the number of urls scraped and so the google image search tab will need to scrape an extra url for every failure. So I'm stuck with this process.

For now I've been using the JSExecute plugin that lets me run js on the browser to open a new tab, then I just hero.waitForNewTab(), but yes having this innately supported by hero I think is pretty useful functionality.

When I get some time I could put together a PR, sure.

NN-Binary commented 3 weeks ago

Workaround:

Host a HTML page, start the instance with this URL, always keep this tab active, then do a trigger click from it using ExecuteJS

            await Tabs[0].tab.executeJs(() => {
                // @ts-ignore
                document.getElementById('open-link').setAttribute('href', 'https://website.com');
                document.getElementById('open-link').click();
            });