matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 349 forks source link

x-ray gets the same html when run in a for loop? #219

Closed SandeshSarfare closed 7 years ago

SandeshSarfare commented 8 years ago

I have below code which I have written to get all the links from pages however when the loop is running , I only get links on first page and that too multiple times so I think html is not getting changed i.e "links" from callback function. Not sure if that is because I am missing something in my code...but it should work with this code Or is it because cheerio is not loading it....Please suggest.

for (var i = 1; i < 3; i++) {   
x("http://getThoseLinks?pager=true&paged="+i, '.qualification-grid@html')(function(err, links) {
  count=1;
  var $ = cheerio.load(links); // links html is not getting changes
            cheerioTableparser($);
            var data = $(".Q-grid").parsetable(false, false, false);
            for (var k = 0; k < data[0].length; k=k+4) {
                var regex = /<a href="(.*?)"/g;
                while (match = regex.exec(data[0][k])) {
                   LinksArr.push(match[1]);
                   count++;//Every page has 10 links
                }

            }
  for (var m = 0; m < LinksArr.length; m++) {
    console.log(LinksArr[m]);
  }
})
}
dangnhdev commented 7 years ago

Currently meet this issue too. I'm loading links from file and loop through it.

Janghou commented 7 years ago

This is basic Javascript stuff, not so much an x-ray issue:

Try let instead of var in the loop.

See http://stackoverflow.com/questions/750486/javascript-closure-inside-loops-simple-practical-example https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/let