scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.84k stars 10.41k forks source link

(Feature Request) Throttling number of spiders running concurrently in the same process #869

Open madvas opened 9 years ago

madvas commented 9 years ago

It would be nice to be able to throttle number of spiders running concurrently from the script as in a docs in "Common Practices" (http://doc.scrapy.org/en/latest/topics/practices.html#running-multiple-spiders-in-the-same-process)

I was doing spider, which started them few hundreds at once, it didn't work, logging didn't even started.
It would be nice if we could easily set this number.

Thank you very much, You're doing great job guys!

madvas commented 9 years ago

Btw, I needed hundreds of different spiders, because I needed to use hundreds of different cookies, it wasn't for different domains as in docs example. If there's better way I should approach this, please feel free to tell me :)

kmike commented 9 years ago

Hi @madvas,

Maybe you can use cookiejar meta key?

But I'm curious why starting a few hundreds at once didn't work.

madvas commented 9 years ago

Hello @kmike,

Well, I don't know neither, I wasn't digging too much into it, cuz I didn't want to overload server. Maybe with cookiejar it would be possible also. Basically what I needed was to log in with around hundred of different accounts and then do scraping. I found this quite clean solution for this.

jmaynier commented 7 years ago

@kmike I use cookiejar for this kind of situationm, but it seems that you cannot specify CONCURRENT_REQUESTS and DOWNLOAD_DELAY per cookiejar.

kmike commented 7 years ago

@jmaynier you can set custom request.meta['download_slot'] for requests, an unique value per cookiejar. This way concurrency settings will work per cookiejar.

jmaynier commented 7 years ago

Thanks @kmike, it seems to be exactly what I was looking for ! I will try it right away :-) It should be part of scrapy documentation.

felipeboffnunes commented 1 year ago

Can #4363 be considered a fix for this one?