Closed py-in-the-sky closed 7 years ago
This is a good proof of concept for a custom project, however this project attempts to be as minimal as possible so that others may use and extend the framework to conduct their crawling.
Do you intend the keyword spider
to be an example? If so, can you provide ideas for what part of the docs need to be updated in order to include this new spider?
The unit tests failed, as it appears like KeywordsItem
does not exist, did you forget to check it in?
Overall, I think this still goes against the contributing guidelines here that state "We are trying to build an generic framework for large scale, distributed web crawling.", and this appears to be a custom implementation for the specific task of finding specific keywords within pages.
If you think of this more as a tutorial, please add unit tests, documentation, and clarification for the above points.
Sorry about this. This was supposed to go to my fork, and I'm not sure how this pull request ended up here.
The database that the spider sends keyword counts to is assumed to be external to the Scrapy-Cluster crawling system.