seethroughdev / status-crawler

A fully configurable crawler to check your website status codes, javascript errors and anything you want.
126 stars 45 forks source link

--required-values spiders ALL #16

Open yarekc opened 7 years ago

yarekc commented 7 years ago

casperjs --start-url=http://www.proxymis.com --required-values=proxymis.com spider.js

does spider links that does not contain the url like:

200 http://www.google-analytics.com/ga.js 200 http://fonts.gstatic.com/s/economica/v4/UK4l2VEpwjv3gdcwbwXE9InF5uFdDttMLvmWuJdhhgs.ttf 200 http://fonts.gstatic.com/s/economica/v4/jObgDQiPUtmACAaaK3pMG6CWcynf_cDxXwCLxiixG1c.ttf 200 http://fonts.gstatic.com/s/lato/v11/v0SdcGFAl2aezM9Vq_aFTQ.ttf 200 http://fonts.gstatic.com/s/lato/v11/nj47mAZe0mYUIySgfn0wpQ.ttf 200 http://connect.facebook.net/fr_FR/all.js#xfbml=1

Shouldn't it ONLY spider resources that contain the required-values parameter ?

doxakis commented 7 years ago

Hi,

the required-values parameters is used to determine which url to open. In your case, it will follow all links with "proxymis.com".

In order to simulate real user, it loads all the related resources from the page. (css, js, ajax call on page load, etc.) So, you can see : file not found, errors..

For your issue, maybe the solution could be to add a new option from the command line to specify resource to skip. When the page requests a resource, we could add something similar to http://stackoverflow.com/a/22274345/4463145 to abort the request.

It would prevent to send data to google analytics stats..