pythonhacker / harvestman-crawler

Automatically exported from code.google.com/p/harvestman-crawler
1 stars 3 forks source link

Design the crawler to run non-stop #17

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Current behaviour:

- projects are executed sequentially.
- once all executed the crawling stops.

Desired behaviour:

The crawler must be able to run non stop:
1. low memory consumption fluctuation.
2. run multiple projects in parallel.
3. read the configuration file on constant intervals to change dynamically
the settings: adding/removing of projects/other settings like bandwidth,
depth etc.
4. trigger re-crawling of pages/projects based on rss triggers. 

Original issue reported on code.google.com by andrei.p...@gmail.com on 17 Jul 2008 at 7:55

GoogleCodeExporter commented 9 years ago
duplicate of issue 16, please remove.

Original comment by andrei.p...@gmail.com on 22 Jul 2008 at 5:36

GoogleCodeExporter commented 9 years ago
Duplicate of issue #16. 

Original comment by abpil...@gmail.com on 6 Oct 2008 at 11:35