Open farazirfan47 opened 6 years ago
Have you looked into Crawlera throttling?
Yes, I disabled ads and some unessential resources but It did not help much.
Hi @farazirfan47 getting good performance from Splash + Crawlera is tricky indeed. HAR (http://splash.readthedocs.io/en/stable/scripting-ref.html#splash-har) can help with diagnosing the issue. One problem that I have seen is that our current example script re-uses sessions, which adds a 12 second delay between subsequent requests when rendering one page. If this is the issue for you indeed (it will be clear from HAR output), then there are two ways to solve it: (1) don't use sessions (2) make only the first request via crawlera (the rest are usually static)
I have analysed HAR output and its clearly shows some of the web page resources taking too while. Can I stop splash for using crawlera when it download web page resources ?
Can I stop splash for using crawlera when it download web page resources ?
@farazirfan47 yes, please see this example (code is commented out): https://github.com/scrapinghub/sample-projects/blob/0a9779cac4564d24c082e4973534f36f33eb75d3/splash_crawlera_example/splash_crawlera_example/scripts/crawlera.lua#L18-L31 - this is from the guide https://support.scrapinghub.com/support/solutions/articles/22000188428-using-crawlera-with-splash
I tried disabling the unessential resources but performance still nor good, first of all Its hard to find unessential resources links then apply filter. I am dealing with 40+ sites and all of them I have to write separate rules which is time consuming task.
@farazirfan47 are resources for one page downloaded in parallel or sequentially?
Bump. Has anyone figured out good solution for this? I've integrated Crawlera + Splash but it's incredibly slow, take more than few minutes to load a web page. I've limited concurrent requests to 10 in Scrapy and Splash due to Crawlera basic plan limits.
Hi, I have integrated crawlera with splash and now the response is really slow, I have increased timeout limit. Please let me know how can I improve my request speed when use crawlera as proxy with splash. Is it good choice using crawlera with splash ?