seethroughdev / status-crawler

A fully configurable crawler to check your website status codes, javascript errors and anything you want.
126 stars 45 forks source link

Spider for CasperJS

This script uses casperJS to crawl your site and log all urls, response codes, errors and warnings to a json file for parsing.


What casperjs-spider does

Getting Started

Make sure you have casperJS and phantomJS installed.

Configure the script by setting your config options in config.js or passing arguments in the command line.

In your terminal, navigate to the folder containing the spider.js file.

casperjs spider.js

casperjs --start-url=http://example.com --required-values=example.com spider.js

Casper arguments go in the middle, and they will override config options in the script.

Config Options

There are several configuration options in casperjs-spider. You can set them individually in the command line, or by editing the config portion of spider.js.

It might help to refer to the default config options in config.js for examples

start-url *required

required-values *required

skipped-values

limit

user-agent

file-location default=./logs/

date-file-name default=false

verbose default=false

log-level default=error

load-images default=false

load-plugins default=false

cb default=null

Contributing

Feel free to edit for yourself, or send a pull-request with any improvements.

Any pull-requests should be pulled from master and sent to separate branch prefixed with incoming-.

This script wouldn't be possible without PlanZero whose script I started with in the very beginning. I highly recommend still checking it out for a bare-bones version.