Closed odmaaa closed 8 years ago
Hey @omkaaa,
Please check http://splash.readthedocs.org/en/stable/faq.html - does it help?
Hi @kmike ,
Yes it helped thank you,I disabled the image and set time-out to 720, all worked great. Thank you
@omkaaa glad to hear that!
Follow @omkaaa, I change the args to:
yield SplashRequest(url, self.parse, args={'wait': 0.5, 'viewport': '1024x2480', 'timeout':90, 'images': 0}
It works!
Besides, some website would very quick when you using curl
or Browser, but very slow in splash, because splash cannot download some resources currectly.
These can also come across with 504 Gateway Time-out
. The right way is stop the slow resource download. in Splash, you can set resource_timeout
in args:
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url,
self.parse,
args={'wait': 0.5, 'viewport': '1024x2480', 'timeout': 90, 'images': 0, 'resource_timeout': 10},
)
Thanks @yeszao it works
Hello, I am crawling a website with 10K contents, when I crawl first it's all response 200, everything is ok, but after few minutes 504 Gateway Time-out appears and after 3 times retrying scrapy give up retrying. I set :
and endpoint is render.html
I am using : scrapy version: 1.0.3 python:2.7 *docker server
How can I optimize my crawler ? and avoid 504 error?