CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.
Apache License 2.0
22
stars
16
forks
source link
Feature: Add some new crawlers for popular web pages #41
Add crawl spiders for the following or popular websites.
Currently implemented spiders can be found in - https://github.com/leopardslab/CrawlerX/tree/master/scrapy_app/scrapy_app/spiders