zrashwani / arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites
MIT License
253 stars 60 forks source link

'Don't crawl external links' option #9

Closed ollietreend closed 9 years ago

ollietreend commented 9 years ago

It would be great to have the option to disable the crawling of external links.

I'd like to crawl an entire website, but am not interested in external links. By setting the 'depth' option to something suitably high in order to capture the entire website, I also end up doing a deep crawl of external websites.

ollietreend commented 9 years ago

Actually, my mistake – I misunderstood how the crawler handles external link. My understanding now is that external links do not have their children traversed – which is what I was asking for.

Please ignore. Nothing to see here. :blush: