matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 350 forks source link

Cannot Read property 'html' of null #125

Open christiansaiki opened 8 years ago

christiansaiki commented 8 years ago

Sometimes I get this strange error

var $ = html.html ? html : cheerio.load(html); ^ TypeError: Cannot read property 'html' of null at load (/Users/christian/development/beacon/beaconJobs/node_modules/x-ray/index.js:166:19) at /Users/christian/development/beacon/beaconJobs/node_modules/x-ray/index.js:85:19 at /Users/christian/development/beacon/beaconJobs/node_modules/x-ray/index.js:248:14 at _done (/Users/christian/development/beacon/beaconJobs/node_modules/x-ray/node_modules/x-ray-crawler/node_modules/enqueue/index.js:78:20) at _once (/Users/christian/development/beacon/beaconJobs/node_modules/x-ray/node_modules/x-ray-crawler/node_modules/enqueue/index.js:93:15) at result (/Users/christian/development/beacon/beaconJobs/node_modules/x-ray/node_modules/x-ray-crawler/lib/index.js:107:7) at /Users/christian/development/beacon/beaconJobs/node_modules/x-ray/node_modules/x-ray-crawler/node_modules/wrap-fn/index.js:121:18 at /Users/christian/development/beacon/beaconJobs/init/xrayDriver.js:28:16 at Request.callback (/Users/christian/development/beacon/beaconJobs/node_modules/superagent/lib/node/index.js:788:12) at Stream. (/Users/christian/development/beacon/beaconJobs/node_modules/superagent/lib/node/index.js:997:12) at emitNone (events.js:72:20) at Stream.emit (events.js:166:7) at Unzip. (/Users/christian/development/beacon/beaconJobs/node_modules/superagent/lib/node/utils.js:108:12) at emitNone (events.js:72:20) at Unzip.emit (events.js:166:7) at endReadableNT (_stream_readable.js:905:12) at doNTCallback2 (node.js:441:9) at process._tickDomainCallback (node.js:396:17)

I've checked the index.js file and I've found an odd code for the function load

function load(html, url) {
      var $ = html.html ? html : cheerio.load(html);
      if (url) $ = absolutes(url, $);
      return $;
    }

There is no check for the html variable so if html is null html.html will crash the code for sure

I haven't read all the code but I did a quick fix that is preventing the code to crash

  function load(html, url) {
      var $ = html ? html.html ? html : cheerio.load(html) : null;
      if (url && $) $ = absolutes(url, $);
      return $;
    }

Do you think that this is the solution for the issue? If it is so I can make a PR. Best Regards

rentrop commented 8 years ago

+1