scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.04k stars 507 forks source link

Splash is no longer able to render a page with JavaScript #1153

Open utilitylens opened 2 years ago

utilitylens commented 2 years ago

Hi there!

Kudos to you guys for making some amazing software!

Up until recently, I've been able to successfully parse this website 'https://www.eversource.com/security/account/login' (amongst many others).

Unfortunately, as I mentioned, recently, I believe the website maintainers changed something on the back end, and now the site is no longer rendering correctly.

The expected result would be a typical login screen where it asks for User and Password. Instead, it essentially only shows the navbar and the footer.

I've reviewed and tried all of the suggestions made in https://splash.readthedocs.io/en/stable/faq.html#website-is-not-rendered-correctly Most notably:

  1. Waiting to ensure the site renders completely using 'splash:wait'
  2. Specifying different user agents using splash:set_user_agent
  3. Disabling private mode (using --disable-private-mode or splash.set_private_mode_enabled = false

I normally run splash with the command (on Ubuntu linux 20): ' sudo docker run -p 8050:8050 --memory=1G --restart=always scrapinghub/splash --disable-private-mode --max-timeout 3600 --maxrss 1024 -v3'

Currently, I'm running the following versions: [-] Splash version: 3.5 [-] Qt 5.14.1, PyQt 5.14.2, WebKit 602.1, Chromium 77.0.3865.129, sip 4.19.22, Twisted 19.7.0, Lua 5.2 [-] Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]

The easiest way to reproduce this issue would be to run the following in the splash UI aka (http://localhost:8050):

`function main(splash, args)

splash.resource_timeout = 0 splash.private_mode_enabled = false splash:set_user_agent('Mozilla/5.0 (Windows NT 6.1; rv:51.0) Gecko/20100101 Firefox/51.0')

local login_url = 'https://www.eversource.com/security/account/login'

assert(splash:go(login_url)) assert(splash:wait(10))

return { html = splash:html(), png = splash:png(), har = splash:har(), } end`

The only clues I have seen are a few errors in the verbose output of splash when run with '-v3'. Specifically, I see the following:

'[render] JsConsole(https://www.eversource.com/content/UserControls/PrimaryNavNew/PrimaryNavNew.ascx.js:69): TypeError: item of items is not a function. (In 'item of items', 'item of items' is undefined)
[render] JsConsole(https://www.eversource.com/content/WebsiteTemplates/NU/js/AppD/jsagent/adrum/adrum.js:27): TypeError: |this| is not a object
[render] JsConsole(https://cdn.eversource.com/prod/ms-login/2022.2.2.13/static/js/main.bundle.js:2): TypeError: |this| is not a object '

Note that I'm able to access this page (and see the login page) using a normal browser (I've used both Safari and Firefox).

I guess my main question is... is there something that I can do to get this to render again, or is the version of splash WebKit simply incompatible?

I currently have a webapp where I'm using scrapy combined with splash to parse a number of utility sites. If splash is no longer capable of rendering websites using modern javascript, then I may need to move to some other solution. This is a bummer to me, because so far I've been happy with the performance and capabilities.

Thanks in advance for any assistance you could provide.

P.S. If there's any other supporting information that I could give, please let me know!

vezuras commented 2 years ago

Samething :(

Have you fixed this !?

MADDY312 commented 2 years ago

yes, it has almost become useless, can you suggest some better options, like faster and more reliable?

or will splash come up with some updates any sooner?