Open mrxdev-git opened 1 month ago
It seems like either you are not closing Hero sessions, or there's some kind of leak. Can you share a simple reproducible example?
@blakebyrnes Here is a simplified version of my code, It processes around 600 hundred links and then crashes with the error above. When it starts out it goes fast and easy, but as it goes further it gets slower and slower until it crashes.
import fs from 'fs';
import HeroCore from '@ulixee/hero-core';
import {TransportBridge} from "@ulixee/net";
import Hero, {ConnectionToHeroCore} from "@ulixee/hero";
function readUrlsFromFile(filePath) {
try {
const fileContent = fs.readFileSync(filePath, 'utf-8');
return fileContent.split('\n').map(line => line.trim()).filter(line => line.length > 0);
} catch (error) {
console.error('Error reading the file:', error);
throw error;
}
}
(async () => {
const links = readUrlsFromFile('links.txt')
const bridge = new TransportBridge();
const connectionToCore = new ConnectionToHeroCore(bridge.transportToCore);
const heroCore = new HeroCore();
heroCore.addConnection(bridge.transportToClient);
const options = {
connectionToCore,
blockedResourceTypes: [
'BlockImages',
'BlockCssAssets',
'BlockFonts',
'BlockMedia',
'BlockIcons'
],
viewport: {
width: 1280,
height: 1024
},
showChromeInteractions: false,
showChrome: false,
sessionPersistence: false
};
const browser = new Hero(options);
try {
for await (const link of links) {
try {
await browser.goto(link);
// await browser.waitForPaintingStable();
const price_tag = await browser.waitForElement(browser.xpathSelector(
"//span[text()[contains(.,'Some text')]]"
), {
timeoutMs: 10e3
})
if (price_tag) {
const price = await price_tag.parentNode.querySelector('div > span').textContent;
console.log(price)
} else {
throw new Error('No card price tag')
}
} catch (er) {
console.log(er.message)
}
await browser.waitForMillis(2e3)
}
} catch (err) {
console.log(err.message)
} finally {
await browser.close();
}
})()
Got it. Thanks.
This approach won't work super well with Hero. You're unintentionally creating a single hero session for all your activities. Hero is built to handle each of your links in a single session (or some small subset that might be considered a single "action" by a user. You will have better luck with the way it's designed if you're able to break things up into a smaller set of chunks (like do batches of 100 or something).
Every time you "close" a session, Hero can clean up all the resources/navigation/etc it has collected. Hero acts like you might still want to act on that information, so it keeps it around, because it is built assuming you are reacting to items created during the "session".
@blakebyrnes Thank you so much for the explanation, everything is working fine now.
apparently this fixes the issue:
await connectionToCore.disconnect();
await heroCore.close();
Hi, I'm trying to parse around 5000 links from a site that is JS-rendered, and every several hundred requests I get this error:
Any ideas why this error could be caused and how to solve it? I tested on MacOS Ventura 13.5.2 M2 and Debian 11.