matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 350 forks source link

fix error when html is null #163

Closed nicobevilacqua closed 8 years ago

nicobevilacqua commented 8 years ago

Description

html to empty before cheerio.load.

coveralls commented 8 years ago

Coverage Status

Coverage increased (+0.01%) to 96.238% when pulling f73990feeb2725568b244331345ecac87f8296a3 on nicobevilacqua:master into 1caf24f979328162b713825998fff6cd508df4a3 on lapwinglabs:master.

Kikobeats commented 8 years ago

hmmmm it sounds familiar for me, what happens that currently this case is not handle?

Can we write a test to verify it?

Kikobeats commented 8 years ago

https://github.com/lapwinglabs/x-ray/blob/master/index.js#L73 https://github.com/lapwinglabs/x-ray/blob/master/index.js#L87

Can we centralize this cases too?

nicobevilacqua commented 8 years ago

The situation is this. I'm use "xray" encapsulated on a es6 promise, and when the html is null in the load function i get an error that is not caught for the promise. Only with this url the error happens.

I tried es6 promises, q module, and a try catch block and nothing works.

const xray = new (require('x-ray'))();
const xrayQuery = xray('http://www.viagrupo.com/santo-domingo/rss/deal-of-day.xml', 'body');

// TRY - CATCH
try {
    xrayQuery(error => {
        if (error) {
            console.error(error);
            return;
        }
        console.log('success');
    });
} catch (e) {
    console.log(e);
}

// ES6 PROMISE
(new Promise((resolve, reject) => {
    xrayQuery((error, response) => {
        if (error) {
            reject(error);
            return;
        }
        resolve(response);
    });
}))
    .then(() => console.log('success'))
    .catch(error => console.error(error.stack));

// Q PROMISE
const Q = require('q');
(new Q.Promise((resolve, reject) => {
    xrayQuery((error, response) => {
        if (error) {
            reject(error);
            return;
        }
        resolve(response);
    });
}))
    .then(() => console.log('success'))
    .catch(error => console.error(error.stack));
/*

    THIS IS THE ERROR

    var $ = html.html ? html : cheerio.load(html)
             ^
    TypeError: Cannot read property 'html' of null
    at load (../node_modules/x-ray/index.js:196:15)
    at ../node_modules/x-ray/index.js:63:19
    at ../node_modules/x-ray/index.js:189:14
    at _done (../node_modules/enqueue/index.js:78:20)
    at _once (../node_modules/enqueue/index.js:93:15)
    at result (../node_modules/x-ray-crawler/lib/index.js:107:7)
    at ../node_modules/wrap-fn/index.js:121:18
    at ../node_modules/x-ray-crawler/lib/http-driver.js:42:16
    at Request.callback (../node_modules/superagent/lib/node/index.js:691:12)
    at ClientRequest.<anonymous> (../node_modules/superagent/lib/node/index.js:901:12)

*/
Kikobeats commented 8 years ago

@nicobevilacqua just one question, can you write a unit test that verify that now is working fine?