kingRodian / SpideR

MIT License
0 stars 0 forks source link

Policies #10

Open kingRodian opened 7 years ago

kingRodian commented 7 years ago

Implement policies for the behaviour of the spider. These need to be able to interface with the python scripting that is planned.
Policies:
Selection policy - Which sites to download from.
Re-visit policy - If the crawl goes over a long period of time, will it check earlier urls and whether they hae been updated?
Politeness policy - How often to crawl, whether to conform to the robots.txt etc.
Paralellization policy - How does it make use of the threads available?