leopardslab / CrawlerX

CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.
Apache License 2.0
22 stars 16 forks source link

Feature: Add some new crawlers for popular web pages #41

Open sajithaliyanage opened 4 years ago

sajithaliyanage commented 4 years ago

Add crawl spiders for the following or popular websites.

Currently implemented spiders can be found in - https://github.com/leopardslab/CrawlerX/tree/master/scrapy_app/scrapy_app/spiders

ffalpha commented 3 years ago

@sajithaliyanage Is this issue still opened? I like to work on this