scrapinghub / scrapyrt

HTTP API for Scrapy spiders
BSD 3-Clause "New" or "Revised" License
824 stars 161 forks source link

Concurrent scrapyrt requests #141

Open BalzySte opened 2 years ago

BalzySte commented 2 years ago

Hello, I'm using scrapyrt to provide an HTTP interface to a big Scrapy project. I'm running a single scrapyrt instance in a Docker container, some spiders require ~60-120 seconds to complete and I've noticed that requests are handled sequentially, causing substantial delays.

Is that the expected behavior? I know scrapyrt is not suitable for long running spiders, but I'm wondering if there exist a quick fix, for example running multiple workers/threads. Asking here cause I'm not really familiar with the twisted framework.
Another solution would be running multiple scrapyrt instances behind a load balancer, but I'd rather not go down that path.

doverradio commented 1 year ago

I'm facing this same issue right now.

Previously, I had solved it by just creating more instances of the scrapy scripts on various ports.

Did you solve it?