samclarke / robots-parser

NodeJS robots.txt parser with support for wildcard (*) matching.
MIT License
148 stars 19 forks source link

Throws error with a robots.txt with a colon at the end or beginning of a line #4

Closed ghost closed 8 years ago

ghost commented 8 years ago
TypeError: Cannot read property 'indexOf' of null
    at parseRobots (/home/cat/Projects/firm-email-crawler/node_modules/robots-parser/Robots.js:102:23)
    at new Robots (/home/cat/Projects/firm-email-crawler/node_modules/robots-parser/Robots.js:181:2)
    at module.exports (/home/cat/Projects/firm-email-crawler/node_modules/robots-parser/index.js:4:9)
    at /home/cat/Projects/firm-email-crawler/node_modules/simplecrawler/lib/crawler.js:1479:50
    at decodeAndReturnResponse (/home/cat/Projects/firm-email-crawler/node_modules/simplecrawler/lib/crawler.js:496:17)
    at IncomingMessage.<anonymous> (/home/cat/Projects/firm-email-crawler/node_modules/simplecrawler/lib/crawler.js:505:21)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:926:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)

Fixed it by removing

if (!line)
    return null;

from trimLine function