joeyAghion / spidey

A loose framework for crawling and scraping web sites.
MIT License
184 stars 11 forks source link

Multi threaded crawling? #4

Open ylluminate opened 11 years ago

ylluminate commented 11 years ago

Can spidey be threaded so that we can have 10-20 concurrent download threads running at the same time to speed up the process?

joeyAghion commented 11 years ago

This is a possibility, although I haven't confirmed the thread-safety of dependencies. I'd be happy to take a look if you were interested in contributing code around this.

In my use cases, this hasn't been an obstacle because individual spider classes target different sources which we wouldn't want to inundate with requests. We do operate multiple spiders, but we parallelize them with separate tools for that purpose, such as resque.