Closed JortsEnjoyer0 closed 2 weeks ago
Hero was designed with the idea that every instance is a lightweight incognito window on top of shared chrome instances. We were imagining lambda for simple tasks, with an easy ability to shell off new activities. That vision has only been partially brought to life so far - the sessions are light and you can copy user profiles to other instances, resuming activities in a use case like yours.
You could grab the links in one session and create a PQueue of link processors to read it (kinda like this example: https://github.com/ulixee/hero/blob/fa241bd77fc182e576a49a416482dd003db2541e/examples/ulixee.org.ts)
That said, I think we probably are just overdue to add this feature. It's already in the backend, just not the client. Happy to take a PR if you're in dire need of it or feeling adventurous!
Prior code (originally using puppeteer) did scrape the links and create a queue to be processed by a later "pass" similar to how you described in your suggestion. The issue with that is feedback is needed by that first "google image" pass as to whether each url succeeded or failed to process by the second pass. There's a quota to the number of urls scraped and so the google image search tab will need to scrape an extra url for every failure. So I'm stuck with this process.
For now I've been using the JSExecute plugin that lets me run js on the browser to open a new tab, then I just hero.waitForNewTab(), but yes having this innately supported by hero I think is pretty useful functionality.
When I get some time I could put together a PR, sure.
Workaround:
Host a HTML page, start the instance with this URL, always keep this tab active, then do a trigger click from it using ExecuteJS
await Tabs[0].tab.executeJs(() => {
// @ts-ignore
document.getElementById('open-link').setAttribute('href', 'https://website.com');
document.getElementById('open-link').click();
});
There appears to be no way to open a new tab through hero's api.
My workflow involves having one tab for google images search and another tab for sending the urls in each result through another website and scraping the results of that website's output.
If I'm not to create a second tab there, then what is the correct paradigm?