scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

Fatal Python Error: Segmentation fault #985

Open OlegYurchik opened 4 years ago

OlegYurchik commented 4 years ago

I tried many async requests by 15 threads to splash like that

async with aiohttp.ClientSession() as session:
    async with session.get(
        "http://localhost:8050/render.html",
        params={"url": url, "timeout": 10, "wait": 3},
    ) as response:
        pass

After about twenty minutes I got this:

2019-12-31 00:47:51.841601 [events] {"path": "/render.html", "rendertime": 7.1810667514801025, "maxrss": 2866060, "load": [3.06, 3.09, 3.17], "fds": 150, "active": 8, "qsize": 0, "_id": 139710589285600, "method": "GET", "timestamp": 1577753271, "user-agent": "Python/3.8 aiohttp/3.6.2", "args": {"url": "http://www.mueller-schwalbach.de/", "timeout": "10", "wait": "3.0", "uid": 139710589285600}, "status_code": 200, "client_ip": "172.17.0.1"}
2019-12-31 00:47:51.841793 [-] "172.17.0.1" - - [31/Dec/2019:00:47:51 +0000] "GET /render.html?url=http://www.mueller-schwalbach.de/&timeout=10&wait=3.0 HTTP/1.1" 200 12795 "-" "Python/3.8 aiohttp/3.6.2"
Fatal Python error: Segmentation fault

Current thread 0x00007f114c87f740 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/qt5reactor.py", line 304 in run
  File "/app/splash/server.py", line 441 in main
  File "/app/bin/splash", line 4 in <module>

Can I fix it?

davidkong0987 commented 4 years ago

have you been able to fix this? This is kind of an issue. I don't remember seeing it until a few months ago.

rodcox89 commented 4 years ago

I'm having this issue too.

davidkong0987 commented 4 years ago

Not sure if this is an acceptable fix for you but I have been using https://github.com/TeamHG-Memex/aquarium.

rodcox89 commented 4 years ago

Unfortunately it's not. I've deployed the containers to aws ecs, and am using an application load balancer instead of haproxy.

tomislater commented 4 years ago

I have got this error only on some pages...

Rafiot commented 4 years ago

I can consistently reproduce the crash on this page: https://www.steinfort.lu/news

splash is started this way: docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash:3.5.0 --disable-browser-caches

jeetendrad commented 4 years ago

I too can consistently reproduce this issue on some pages. This causes the splash to crash. Any suggestions on how to fix this issue.

gingergenius commented 3 years ago

I have this issue too

bloodforcream commented 3 years ago

I turned JS off by adding "splash.js_enabled = false" before "assert(splash:go(args.url))" in lua_script. It worked just fine in my case. My lua_script: """ function main(splash, args) splash.js_enabled = false assert(splash:go(args.url)) assert(splash:wait(1)) return { html = splash:html(), png = splash:png(), har = splash:har(), } end""" (indentations don't show properly for some reason)

Rafiot commented 3 years ago

Well, yeah, but disabling javascript means we can also replace splash by curl and call it a day. It's not really an option.

In my case, it clearly happens on very, very bad websites loading insane amount if content. But it would still be nice to catch the exception instead of segfaulting

vishalmry commented 3 years ago

Well, yeah, but disabling javascript means we can also replace splash by curl and call it a day. It's not really an option.

In my case, it clearly happens on very, very bad websites loading insane amount if content. But it would still be nice to catch the exception instead of segfaulting

I've been getting the same issue, I've been using splash through docker, is there a way to handle the exception if I am using docker?

davidljohnson commented 3 years ago

I came across this issue too and I seemed to work around it with the following (taken from https://splash.readthedocs.io/en/stable/faq.html#how-to-run-splash-in-production):

docker run -d -p 8050:8050 --restart=always scrapinghub/splash --max-timeout 3600

You will still get the error, but at least you will always have a container available to handle requests.