scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.07k stars 513 forks source link

Splash instances getting restarted suddenly while rendering. #956

Open Mideen opened 4 years ago

Mideen commented 4 years ago

while rendering the following set of URLs, Splash instances restarted suddenly.

http://www.stgeorgescarehomes.co.uk https://grinders-social-club.site123.me https://www.academycapitalmgmt.com https://www.popularbathroom.com https://grinders-social-club.site123.me https://www.carwashpatrol.com https://www.westbengalcharity.org https://www.homebuyersunite.com https://www.bodysculpttherapies.com https://www.2020-hi.com https://www.hypnotechinst.com http://www.karriers.com http://www.benefitbrokersonline.com https://www.divinedesignshihtzu.com http://www.stgeorgescarehomes.co.uk https://www.imagellcnj.com https://www.samsembroidery.com http://www.crystalpoolsofpetersburg.com http://www.audiovoyage.com https://www.richsremodeling.net https://www.brucefischman.com https://www.refreshpowerwashing.com

Please help me to solve this

lucywang000 commented 4 years ago

@Mideen It would be much helpful if you provide the running environment of splash, e.g.

@kmike maybe we can make a github template to ask people to fill these details when creating an issue?

Mideen commented 4 years ago

We are using docker and docker-compose to run Splash. (Aquarium setup only) Affected Splash version: 3.2 and 3.3.1 (I have tested with these two versions only) Docker Version : Docker version 19.03.2, build 6a30dfc docker-compose version : docker-compose version 1.23.2, build 1110ad01 OS : Tried with Ubuntu 14.04, kernal : 4.2.0-42-generic(Physical Machine) and CentOS Linux 7 kernal : 3.10.0-693.21.1.el7.x86_64(Virtual machine)

@lucywang000

My docker-compose yml file configuration image: scrapinghub/splash:3.3.1 command: --max-timeout 3600 --slots 5 --maxrss 1500 --verbosity 3 --disable-browser-caches --disable-private-mode expose:

lucywang000 commented 4 years ago

@Mideen Have you tried the latest beta version (3.4b1) ?

Mideen commented 4 years ago

Haven't tried. Will try and let you know @lucywang000

kmike commented 4 years ago

I can reproduce the crash in 3.4b1 as well, on all URLs from the list which I tried. Disabling JavaScript or filtering out fonts doesn't fix it; it looks like a problem with CSS processing in the old webkit engine, though I'm not completely sure.

Mideen commented 4 years ago

Thank you @kmike

Mideen commented 4 years ago

Do you guys have any idea to overcome this issue? @lucywang000 @kmike

kmike commented 4 years ago

@annulen do you have an easy way to check if these URLs crash qtwebkit test browser, and if so, why?

annulen commented 4 years ago

@kmike Is your question about these particular cases, or about getting meaningful crash reports from users?

For the former, it seems that at least a few URLs from the list result in CSS-related crash, maybe because they are using particular CSS construct which is handled improperly. I'll investigate possible ways to solve it.

For the latter, generic instructions like http://qutebrowser.org/doc/stacktrace.html would fit. However, you may also consider embedding crash reporting solution (e.g. Google's breakpad or crashpad) into Splash itself.

kmike commented 4 years ago

Is your question about these particular cases, or about getting meaningful crash reports from users?

About the particular cases, though any feedback on getting meaningful crash reports is, of course, appreciated.

For the former, it seems that at least a few URLs from the list result in CSS-related crash, maybe because they are using particular CSS construct which is handled improperly. I'll investigate possible ways to solve it.

Thank you! Do you think it is going to be solved automatically after upgrading of the webkit engine?

For the latter, generic instructions like http://qutebrowser.org/doc/stacktrace.html would fit. However, you may also consider embedding crash reporting solution (e.g. Google's breakpad or crashpad) into Splash itself.

Interesting, thanks! I didn't know about breakpad and crashpad. It seems Dropbox is using a similar solution (https://blogs.dropbox.com/tech/2018/11/crash-reporting-in-desktop-python-applications/).

annulen commented 4 years ago

Do you think it is going to be solved automatically after upgrading of the webkit engine?

Yes, issue doesn't reproduce in qtwebkit-dev-wip

Mideen commented 4 years ago

Is Splash crew have any idea to upgrade the qtwebkit in the next Splash version?

kmike commented 4 years ago

@Mideen in the next release qtwebkit will be upgraded (it is acutually already upgraded in master), but this is a minor upgrade, with some bug fixes, not bringing the rendering engine to the modern state. Upgrade which is currently in master won't fix issues on your websites.

Updating qtwebkit to latest webkit is not finished yet; @annulen knows all the details. The work happens in https://github.com/qtwebkit/qtwebkit repository. @annulen runs a Patreon campaign to make this work possiible (https://www.patreon.com/annulen).

Once qtwebkit is upgraded to a latest webkit version, we'll start investigating how to upgrade Splash to use it. I'm not sure what it would take, as I'm not familiar with the changes yet. We also need to take PyQT in consideration, as we're not using qtwebkit directly via C++ API. Hopefully, that won't be too bad.

biggosh commented 4 years ago

I can add also this url http://www.informagiovani-italia.com/Avignone.htm

Thanks

Mideen commented 3 years ago

Hello Team, Any update on this issue. ? Still the issue is reproduced in version 3.5.