When reading certain URLs, the body returns empty, which I believe is because of being blocked by the provider. When this happens, instead of an error being returned, an exception is raised by jsdom, because the empty HTML object is passed right into it.
Simple STR:
var readability = require('node-readability');
var url = 'http://dotearth.blogs.nytimes.com/2013/11/21/did-90-companies-cause-the-climate-crisis-of-the-21st-century/';
readability.read(url, { timeout: 5000 }, function(err, article) {
// It will never reach this point
console.log(err, article);
});
Adding this line to line 94 of readability.js solves the issue (although it doesn't fix not being able to read the URL).
if (typeof body !== 'string') body = body.toString();
if (!body) return callback('No Body Found');
I can make this into a pull request if needed, but I'm not sure what the deeper issue is where these URLs aren't readable.
When reading certain URLs, the body returns empty, which I believe is because of being blocked by the provider. When this happens, instead of an error being returned, an exception is raised by jsdom, because the empty HTML object is passed right into it.
Simple STR:
Adding this line to line 94 of readability.js solves the issue (although it doesn't fix not being able to read the URL).
I can make this into a pull request if needed, but I'm not sure what the deeper issue is where these URLs aren't readable.