luin / readability

📚 Turn any web page into a clean view
2.49k stars 312 forks source link

Call Stack Size Exceeded #104

Open brandonparsons opened 6 years ago

brandonparsons commented 6 years ago

Hi there,

I'm getting the following error when trying to use this library:

~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:870
Parser.prototype._isSpecialElement = function (element) {
                                              ^

RangeError: Maximum call stack size exceeded
    at module.exports.Parser._isSpecialElement (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:870:47)
    at genericEndTagInBody (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:1899:15)
    at Object.endTagInBody [as END_TAG_TOKEN] (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:2002:17)
    at module.exports.Parser._processToken (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:619:38)
    at module.exports.parser._processToken (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/location_info_mixin.js:89:35)
    at module.exports.Parser._processFakeEndTag (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:663:10)
    at buttonStartTagInBody (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:1351:11)
    at buttonStartTagInBody (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:1352:9)
    at buttonStartTagInBody (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:1352:9)
    at buttonStartTagInBody (~/code/my_project/node_modules/node-readability/node_modules/parse5/lib/tree_construction/parser.js:1352:9)

Do you know why I might be getting this error? Is there a way to tell this library to stop attempting to parse the file, rather than blowing up? I can't seem to catch this error and let my script keep running.

Thanks!

haroldtreen commented 6 years ago

Looks like it's coming from parse5... stack trace doesn't suggest where in readability this is getting called... 🤔 .

Do you have the html you're trying to parse?

brandonparsons commented 6 years ago

Unfortunately not. I was going through thousands of URLs, and everything is async so I can't grab which one it was :)

vladat commented 4 years ago

I'm experiencing the same issue while trying to parse this URL: https://www.salewa.com/men-trekking-hiking-shoes

RangeError: Maximum call stack size exceeded
    at get (internal/bootstrap/pre_execution.js:295:8)
    at debug (/Users/backup/Projects/sampleproject/server/node_modules/node-readability/node_modules/request/lib/debug.js:5:26)
    at IncomingMessage.<anonymous> (/Users/backup/Projects/sampleproject/server/node_modules/node-readability/node_modules/request/request.js:745:5)
    at IncomingMessage.emit (events.js:215:7)
    at IncomingMessage.EventEmitter.emit (domain.js:476:20)
    at IncomingMessage.<anonymous> (/Users/backup/Projects/sampleproject/server/node_modules/node-readability/node_modules/request/request.js:950:39)
    at IncomingMessage.emit (events.js:210:5)
    at IncomingMessage.EventEmitter.emit (domain.js:476:20)
    at IncomingMessage.<anonymous> (_http_client.js:368:14)
    at IncomingMessage.emit (events.js:215:7)
    at IncomingMessage.EventEmitter.emit (domain.js:476:20)
    at IncomingMessage.<anonymous> (/Users/backup/Projects/sampleproject/server/node_modules/node-readability/node_modules/request/request.js:950:39)
    at IncomingMessage.emit (events.js:210:5)
    at IncomingMessage.EventEmitter.emit (domain.js:476:20)
    at IncomingMessage.<anonymous> (_http_client.js:368:14)
    at IncomingMessage.emit (events.js:215:7)
    at IncomingMessage.EventEmitter.emit (domain.js:476:20)
    at IncomingMessage.<anonymous> (/Users/backup/Projects/sampleproject/server/node_modules/node-readability/node_modules/request/request.js:950:39)
    at IncomingMessage.emit (events.js:210:5)
    at IncomingMessage.EventEmitter.emit (domain.js:476:20)
    at IncomingMessage.<anonymous> (_http_client.js:368:14)
    at IncomingMessage.emit (events.js:215:7)
vladat commented 4 years ago

I did some investigation and it seems to be a known issue with the requestpackage: https://github.com/request/request/issues/2008 https://github.com/yarnpkg/yarn/issues/7542 <= bugfix proposed here seems to work in my case

However request package is now fully deprecated so we shouldn't expect this to be fixed on their side.