scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.08k stars 514 forks source link

Deploy on Azure Container Instances failing #1099

Open rplati opened 3 years ago

rplati commented 3 years ago

The image works locally, but crashes on Azure Container Instances – log below. The container apparently cannot be started because the folder /etc/splash/filters is not found.

Might be related to issue 1025, where the same error message was encountered trying to deploy on Heroku. See also issue 1 in Shokesu / splash.

Start streaming logs:
2021-01-20 13:42:02+0000 [-] Log opened.
2021-01-20 13:42:02.919023 [-] Xvfb is started: ['Xvfb', ':367440129', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-splash'
2021-01-20 13:42:09.613219 [-] Splash version: 3.5
2021-01-20 13:42:11.897165 [-] Qt 5.14.1, PyQt 5.14.2, WebKit 602.1, Chromium 77.0.3865.129, sip 4.19.22, Twisted 19.7.0, Lua 5.2
2021-01-20 13:42:11.899691 [-] Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
2021-01-20 13:42:11.900728 [-] Open files limit: 8192
2021-01-20 13:42:11.901373 [-] Open files limit increased from 8192 to 1048576
The X11 connection broke: I/O error (code 1)
XIO:  fatal IO error 22 (Invalid argument) on X server ":367440129"
      after 489 requests (489 known processed) with 0 events remaining.
2021-01-20 13:42:12.132247 [-] Traceback (most recent call last):
2021-01-20 13:42:12.132989 [-]   File "/app/bin/splash", line 4, in <module>
2021-01-20 13:42:12.133870 [-]     main()
2021-01-20 13:42:12.134094 [-]   File "/app/splash/server.py", line 433, in main
2021-01-20 13:42:12.134811 [-]     dont_log_args=set(opts.dont_log_args),
2021-01-20 13:42:12.134972 [-]   File "/app/splash/server.py", line 306, in default_splash_server
2021-01-20 13:42:12.135431 [-]     disable_browser_caches=disable_browser_caches,
2021-01-20 13:42:12.135586 [-]   File "/app/splash/network_manager.py", line 58, in __init__
2021-01-20 13:42:12.136080 [-]     self.adblock_rules = AdblockRulesRegistry(filters_path, verbosity=verbosity)
2021-01-20 13:42:12.136409 [-]   File "/app/splash/request_middleware.py", line 162, in __init__
2021-01-20 13:42:12.136788 [-]     self._load(path)
2021-01-20 13:42:12.136929 [-]   File "/app/splash/request_middleware.py", line 186, in _load
2021-01-20 13:42:12.141827 [-]     for fname in os.listdir(path):
2021-01-20 13:42:12.142097 [-] FileNotFoundError: [Errno 2] No such file or directory: '/etc/splash/filters'
Fatal Python error: Segmentation fault
rplati commented 3 years ago

I’ve looked at this at more depth. I made a docker-image for debugging: I removed the ENTRYPOINTthat seems to be causing the crashes and replaced it with ENTRYPOINT tail -f /dev/null in order to keep the container alive. This prevented the crashing and I was able to connect to the container via docker exec and check that there were no missing folders. I then tried starting splash with the command corresponding to the ENTRYPOINTin the original splash dockerfile.

Starting splash this way in azure container instances sometimes works and in those cases the APIs work fine. But more often than not I got the following error message that apparently refers to chromium.

# python3 /app/bin/splash --proxy-profiles-path /etc/splash/proxy-profiles --js-profiles-path /etc/splash/js-profiles --filters-path /etc/splash/filters --lua-package-path /etc/splash/lua_modules/?.lua
2021-02-05 07:26:29+0000 [-] Log opened.
2021-02-05 07:26:29.496673 [-] Xvfb is started: ['Xvfb', ':311916652', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2021-02-05 07:26:31.359701 [-] Splash version: 3.5
[7158:7158:0205/072631.728516:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.

I then tried figuring out why I am sometimes able to get splash to start on ACI and sometimes not. In the West Europe region where I was testing, ACI apparently starts containers on (at least) two different types of machines: splash works on one but not on the other. I requested the info on the machine with uname -v, uname -n ja lscpu.

The machine type where splash works has kernel version #114~16.04.1-Ubuntu SMP Wed Dec 16 02:39:42 UTC 2020 and the CPU supports VT-x Virtualization. The machine type where splash doesn’t start has kernel version #1 SMP Tue Oct 27 21:35:05 UTC 2020 and apparently no VT-x Virtualization. There may be more differences that a was not able to get with the commands I was using, but starting splash consistently worked on one type of machine and failed on the other.

I’ve read https://www.docker.com/blog/compiling-qt-with-docker-multi-stage-and-multi-platform/ as suggested by @Gallaecio in #1100 and from what I can gather, Qt seems to be the reason why splash is so sensitive to the machine it is running on. But I’m not familiar with Qt and docker enough to know how to approach fixing this. Any ideas are appreciated.

66li commented 2 years ago

Maybe related to this question https://github.com/containerd/cri/issues/1507

I use k8s, the solution is to start a container locally, and then use docker cp ...:/etc/splash splash , And then use the k8s volume to mount into the container