pythonhacker / harvestman-crawler

Automatically exported from code.google.com/p/harvestman-crawler
1 stars 3 forks source link

Parallel crawl of projects #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Current behaviour:

Currently, the crawling of the projects is done in a sequential way: one
project finishes, then start the next one.

Desired behaviour:

All the projects should start in parallel without having to wait for each
other, eventually saving the logs separately for each one on them.

Original issue reported on code.google.com by andrei.p...@gmail.com on 17 Jul 2008 at 7:28

GoogleCodeExporter commented 9 years ago
This has to wait for a later release, since the focus now is to get the other 
tasks
done. Parallel crawl of projects will need a very good, barricaded thread design
which does not mix up the child URLs of one project with the other. It is 
possible in
the current design by modifying the way URLs are pushed to the queue etc, but 
the
focus is not here right now.

Original comment by abpil...@gmail.com on 6 Oct 2008 at 11:31