scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.06k stars 513 forks source link

splash 3.4 not responding and hang for some url #978

Closed Techyuvi closed 4 years ago

Techyuvi commented 4 years ago

Hi Team i was crawling tiktok like website which is written in angular js. i write scraper for this and was run perfectly fine but now from yesterday i am getting error and i am not able to fetch data for user's video page. to debug it i run run splash docker image with 'docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 7200 --disable-private-mode -v2' command and when i gave 'https://www.example.com/@user/video/6767235575733947649' page link docker get hang and throwing below error '2019-12-11 07:33:43.712428 [render] JsConsole(https://s16.examplecdn.com/example/falcon/_next/static/chunks/commons.787eed06a7e37b82e3d0.js:1): TypeError: undefined is not an object (evaluating 'h.$languageList')' please help me for this. i have already ask for this on stackoverflow and follow resolving steps but they are not working. Lua script i am using is 'function main(splash, args) assert(splash:go(args.url)) assert(splash:wait(0.5)) return { html = splash:html() } end ' scrapy version : 1.7.3 python 3.6

Gallaecio commented 4 years ago

Can you share a link to your StackOverflow question?

Techyuvi commented 4 years ago

Yup, here it is https://stackoverflow.com/questions/59266898/scrapy-splash-stop-responding-for-some-links?answertab=oldest#tab-top

Gallaecio commented 4 years ago

I see there is more information here than in your StackOverflow question (TypeError line). You might want to include that in your StackOverflow question.

Techyuvi commented 4 years ago

Ya i will add all things in that question as well now.

Techyuvi commented 4 years ago

@Gallaecio i mention that in my stack-overflow question as well now. have you any idea what i going wrong?

Gallaecio commented 4 years ago

None, I’m just hoping with enough information someone else might be able to help you.

Techyuvi commented 4 years ago

Hi all, today i observe more on my splash hang problem and I think website has written some JavaScript code to detect is it a request from proper browser or from splash. Because when i open link('https://www.example.com/@user/video/6767235575733947649') in browser it works fine but when opening in splash browser with default header it redirects me to 'https://s16.examplecdn.com/example/falcon/_next/static/1.0.1.309/pages/_error.js' and after that splash get hang and throw continue sly 2019-12-11 07:33:43.712428 [render] JsConsole(https://s16.examplecdn.com/example/falcon/_next/static/chunks/commons.787eed06a7e37b82e3d0.js:1): TypeError: undefined is not an object (evaluating 'h.$languageList') So, i think this can help you guys to improve splash and hope you can help me how to figure it out.

Techyuvi commented 4 years ago

Hi team. i was going through splash document https://readthedocs.org/projects/splash/downloads/pdf/latest/ and here you mention in 3.0 release notes that you do some sort of handling in network manager. 'Splash shouldn’t crash if an exception happens while creating a request in network manager'. but in my case splash get hanged in network manager and didn't response. Please help me regarding this error. it got stuck here 2019-12-11 07:33:43.712428 [render] JsConsole(https://s16.examplecdn.com/example/falcon/_next/static/chunks/commons.787eed06a7e37b82e3d0.js:1): TypeError: undefined is not an object (evaluating 'h.$languageList')

Techyuvi commented 4 years ago

Hi Team, I figure it out by monitoring splash logs and notice that from website it gives me error.js and after that splash start rendering this page because of this splash get hang and not responding. for this i write lua script to by pass all these links.
splash:on_request(function(request) if string.match(request.url,'error.js') then print("## get error while page rendering ###") request.abort() end end) Hope you will improve splash to handle these kinds of page on it's own. thanks.

imethanlee commented 1 year ago

Hi Team, I figure it out by monitoring splash logs and notice that from website it gives me error.js and after that splash start rendering this page because of this splash get hang and not responding. for this i write lua script to by pass all these links. splash:on_request(function(request) if string.match(request.url,'error.js') then print("## get error while page rendering ###") request.abort() end end) Hope you will improve splash to handle these kinds of page on it's own. thanks.

Hi, may I have your complete lua script? I try to use your codes but it seems still not working.