jonstuebe / scraper

Node.js based scraper using headless chrome
MIT License
46 stars 18 forks source link

Chromium instance not closing #2

Open greghesp opened 5 years ago

greghesp commented 5 years ago

I've got a custom scraper that pulls more Amazon data, inside an async.eachLimit loop.

asynclib.eachLimit(asins, 1, function(asin, callback) {
  // do stuff here
    startScrape(asin, function(cb){

}, function(e){
  console.log('All Done')
async function startScrape(asin, callback) {
 const site = {
   name: "amazon",
   hosts: ["", ""],
   scrape: async page => {
     const title = await getText("#productTitle", page);
     const brand = await getText("#bylineInfo_feature_div", page);
     const bullets = await getText("#feature-bullets ul", page);
     const price = await getText("#priceblock_ourprice", page);
     const description = await getText("#productDescription", page);
     const type = await getText("body", page);

     return {

 try {
   console.log('Fetching ' + asin)
   const data = await Scraper.scrape(`${asin}/`,site);
 // do stuff with data
 } catch(e) {

What I've found though, is after going through a list of thousands of ASINs, it eventually brings my PC to a halt. Looking inside Task Manager, it seems that a new instance of Chromium is created for every scrape, but they never get closed, hence eating ram.

Apologies for the poor image, PC locked up! image

jonstuebe commented 5 years ago

looks great. open up a pr and I'll merge it in, that is if you figure out the memory issue.