ruipgil / scraperjs

A complete and versatile web scraper.
MIT License
3.7k stars 188 forks source link

Arrays are null if assigned more than once #44

Open chmac opened 8 years ago

chmac commented 8 years ago

Here's a simple reproduction

scraperjs = require 'scraperjs'

scraperjs.DynamicScraper.create()
.request
  url: 'http://google.com'
.scrape ->
  a = ['a', 'b', 'c']
  a2 = [a]
  a2.push a
  a2.push a
  a2.push a

  # Return a2
  a2

, (obj, scraperObj) ->
  console.log obj

Or the meaningful part in javascript for those who prefer:

scraperjs.DynamicScraper.create().request({
  url: 'http://google.com'
}).scrape(function() {
  var a, a2;
  a = ['a', 'b', 'c'];
  a2 = [a];
  a2.push(a);
  a2.push(a);
  a2.push(a);
  return a2;
}, function(obj, scraperObj) {
  return console.log(obj);
});

I would expect to see this:

[ [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ] ]

But I see this:

[ [ 'a', 'b', 'c' ], null, null ]

Is it my mistake? Is it a bug? In this package? PhantomJS?

chmac commented 8 years ago

Oh, and I forgot to say, thanks for this awesome tool @ruipgil :-)

ruipgil commented 8 years ago

This is a very strange behaviour. It seems to be a node-phantom problem, since when the result is turned into a string with JSON.stringify the error disappears.

Running your example all I get is an empty array. What version of phantomjs and scraperjs are you running?

Also, in your example, since you start the array with an entry (of a) you should expect the output to be, [ [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ], [ 'a', 'b', 'c' ] ]

Thank you.

chmac commented 8 years ago

scraperjs v 0.3.4 and phantomjs 1.9.8 on Ubuntu 14.04.

Haha, you're right, I missed one of the arrays. :-)