This script uses casperJS to crawl your site and log all urls, response codes, errors and warnings to a json file for parsing.
Make sure you have casperJS and phantomJS installed.
Configure the script by setting your config options in config.js or passing arguments in the command line.
In your terminal, navigate to the folder containing the spider.js file.
casperjs spider.js
casperjs --start-url=http://example.com --required-values=example.com spider.js
Casper arguments go in the middle, and they will override config options in the script.
There are several configuration options in casperjs-spider. You can set them individually in the command line, or by editing the config portion of spider.js.
It might help to refer to the default config options in config.js for examples
start-url *required
--start-url=http://example.com
config.startUrl = 'http://example.com';
required-values *required
--required-values=example.com
config.requiredValues = 'example.com';
skipped-values
--skipped-values=mailto,install,\#,blog/,comment
config.skippedValues = 'mailto,install,#,blog/,comment';
limit
--limit=25
config.limit = 25
user-agent
--user-agent="Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25"
config.userAgent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
file-location default=./logs/
--file-location=./logs/
config.fileLocation = './logs/';
date-file-name default=false
--date-file-name=false
config.dateFileName = false;
verbose default=false
--verbose=false
config.verbose = false;
log-level default=error
--log-level=error
config.logLevel = 'error';
load-images default=false
--load-images=false
config.loadImages = 'false';
load-plugins default=false
--load-plugins=false
config.loadPlugins = 'false';
cb default=null
config.cb = function(data) {return data};
Feel free to edit for yourself, or send a pull-request with any improvements.
Any pull-requests should be pulled from master and sent to separate branch prefixed with incoming-.
This script wouldn't be possible without PlanZero whose script I started with in the very beginning. I highly recommend still checking it out for a bare-bones version.