scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.92k stars 569 forks source link

How to correctly configure CONCURRENT_REQUESTS in a project with multiple spiders #463

Closed jxlil closed 1 year ago

jxlil commented 1 year ago

Sorry for asking here, I tried to get an answer on stackoverflow, but no luck. If I get an answer, I'll update the question on the stackoverflow. Thank you!


I have a project with ~10 spiders, and I run some of them simultaneously using scrapyd. However, I have doubts about whether my CONCURRENT_REQUESTS configuration is correct.

Currently, my CONCURRENT_REQUESTS is 32, but I have seen that it is recommended that this value be much higher (>= 100). But I have a question, is the total number of simultaneous requests that all the running spiders can make or is it the number of simultaneous requests that a single spider can make?

I assume it is the number of simultaneous requests that all the spiders can make and that is why they recommend it to be as high as possible. And I see that I can regulate the number of requests that each spider will make using CONCURRENT_REQUESTS_PER_DOMAIN.

jpmckinney commented 1 year ago

Scrapyd can manage multiple projects, each of which contains multiple spiders. CONCURRENT_REQUESTS operates per-project (i.e. for all spiders in that project).