segment-boneyard / nightmare

A high-level browser automation library.
https://open.segment.com
19.55k stars 1.08k forks source link

How to perform multiple nightmare function without it hanging up #557

Closed AaronTrazona closed 8 years ago

AaronTrazona commented 8 years ago

I'm trying to scrape a webpage with nightmareJS and got stuck. In my program i pass to the function an array on links which i need to the same data from all of them The list can be very long (over 60) and if i try to do a

async.each(Links, function (url, callback) { var nightmare = Nightmare(size); ... } Only the first couple few instances actually return a value , others just hang up and wont load (blank page).When i try to do only three it work perfectly. How can i fix it? How can i redistribute the work , for example three in parallel and only when all done it will do the next set? One more thought maybe use the same instance and repeat the steps for all the links?

rosshinkley commented 8 years ago

I suspect you're hitting a device limitation. Electron is kind of resource-intensive, and even on the best of machines, I suspect that the number of concurrent instances you can run is pretty low.

Judging from your source, it looks like you're using caolan/async, yes? Why not use eachSeries instead of each? Do your operations need to run in parallel?

If you're looking at trying to throttle to a specified number of instances, would async.cargo get you close to what you're after? An off-the-cuff example:

var Nightmare = require('nightmare'),
  async = require('async');

var results = [];
var cargo = async.cargo(function(tasks, cb) {
  async.each(tasks, function(url, cb) {
    var nightmare = Nightmare(), result;
    nightmare.goto(url.link)
      .wait('body')
      .title()
      .then(function(title) {
        results.push(title);
        return nightmare.end();
      })
      .then(function() {
        cb();
      })
  }, function(err) {
    cb(err);
  })
}, 3);

cargo.drain = function() {
  console.dir(results);
};

//build a fake cargo load
for (var i = 0; i < 100; i++) {
  cargo.push({
    link: 'http://localhost:7500/' + i
  }, function(err) {
    //done with the specified link
  });
}
rosshinkley commented 8 years ago

@AaronTrazona Are you still having this problem?

AaronTrazona commented 8 years ago

@rosshinkley sorry, im busy lately thanks im good.

LM1LC3N7 commented 7 years ago

@AaronTrazona: how do you do, in order to make it work?

LM1LC3N7 commented 7 years ago

Sorry, problem solved mostly thanks to #104: "new Nightmare()".

I was calling my function that is located in another file (with module.exports). And my Nightmare variable was defined on the very begining of my file and not in the function.

So, on each loop, when my master function call my Nightmare function, the same variable was used.

Now, I create the Nightmare variable inside the function, in order to create a new instance on each function call.