Open fritz-0 opened 3 years ago
Is ScrapySplash capable of handling the simultaneous requests that can come from the multiple Spiders?
You mean just "splash", not "scrapy-splash" library, right?
If yes then it would work, but there is a limit on how many requests can a splash handle in parallel, IIRC usually 10 requests max are processed and the rest are queued, so it if you throw too many requests at once you'll be getting errors from splash. If you face this problem, there are two possible solutions here - reduce concurrency on the spider side, or deploy more splash processes, e.g. with https://github.com/TeamHG-Memex/aquarium/ or using some hosted splash which already handles this.
@lopuhin Actually, I was referring to "scrapy-splash" I thought these were the same. Apologies. But, anyway, thank you for clarifying still the concurrency of "Splash" and that there is a sort of limit to it still especially if there are too many request.
In relation to what you said about having too many request, I am looking into creating more Docker + Scrapy-Splash per Spider in this case then.
Actually, I was referring to "scrapy-splash"
I really think you meant Splash, that’s what runs in Docker. But it’s common to mix it up with scrapy-splash, which is just a Scrapy plugin/client for Splash.
As @Gallaecio said, scrapy-splash is a scrapy client for splash, and does not impose any extra restrictions on the number of concurrent requests - but splash considerations still apply.
I have a Spider that crawls multiple URLs (100+) and I plan using both
Requests
andSplashRequests
as the website needs JavaScript rendering. And, I am thinking of running Spiders simultaneously for the URLs.Please note that my Spider does a lot of series of callbacks.
Is ScrapySplash capable of handling the simultaneous requests that can come from the multiple Spiders?
That is, even if there is only on Scrapy-Splash running (via Docker) on localhost:8050? If yes, how can I implement such or are there better alternatives?