scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

Kernal panic on out of memory , Causes system crash #917

Open Mideen opened 5 years ago

Mideen commented 5 years ago

My system will crash While increasing splash request load and I am getting the following trace in my kdump.

[59842.121609] Free swap = 2097148kB [59842.121610] Total swap = 2097148kB [59842.121611] 7864212 pages RAM [59842.121613] 0 pages HighMem/MovableOnly [59842.121614] 186261 pages reserved [59842.121615] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [59842.121775] [ 2470] 0 2470 7722 378 20 0 0 systemd-journal [59842.121779] [ 2510] 0 2510 11156 498 22 0 -1000 systemd-udevd [59842.121786] [ 4747] 0 4747 13880 226 27 0 -1000 auditd [59842.121790] [ 4897] 0 4897 6594 441 18 0 0 systemd-logind [59842.121797] [ 4905] 0 4905 85578 811 41 0 0 rsyslogd [59842.121800] [ 4912] 0 4912 28215 1087 57 0 -1000 sshd [59842.121804] [ 4917] 0 4917 6108 552 16 0 0 smartd [59842.121808] [ 4950] 0 4950 70974 12454 92 0 0 puppet [59842.121814] [ 4952] 0 4952 5408 328 15 0 0 irqbalance [59842.121819] [ 4956] 0 4956 22624 799 47 0 0 rngd [59842.121823] [ 4984] 81 4984 15057 636 31 0 -900 dbus-daemon [59842.121826] [ 4996] 0 4996 6791 263 17 0 0 xinetd [59842.121831] [ 5218] 0 5218 31571 434 19 0 0 crond [59842.121835] [ 5330] 0 5330 27523 214 9 0 0 agetty [59842.121839] [ 5387] 38 5387 6950 512 18 0 0 ntpd [59842.121889] [ 7047] 0 7047 26865 577 48 0 0 dhclient [59842.121905] [ 7474] 0 7474 39834 1502 78 0 0 sshd [59842.121914] [ 7476] 1000 7476 39834 638 77 0 0 sshd [59842.121917] [ 7477] 1000 7477 29245 926 14 0 0 bash [59842.121921] [ 7801] 1001 7801 390880 7138 106 0 0 python2.7 [59842.121925] [ 7809] 1001 7809 55237 3270 62 0 0 python2.7 [59842.121939] [15260] 0 15260 7295 386 13 0 0 ossec-execd [59842.121952] [15265] 995 15265 63711 707 23 0 0 ossec-agentd [59842.121981] [15269] 0 15269 44729 1134 19 0 0 ossec-syscheckd [59842.121997] [15277] 0 15277 99457 503 22 0 0 ossec-logcollec [59842.122010] [15281] 0 15281 124557 1843 34 0 0 wazuh-modulesd [59842.122031] [22380] 999 22380 153603 2064 63 0 0 polkitd [59842.122051] [22896] 0 22896 824682 20397 205 0 -500 dockerd [59842.122063] [22909] 0 22909 797304 7287 152 0 -500 docker-containe [59842.122084] [23443] 0 23443 2288 894 9 0 -999 docker-containe [59842.122099] [23627] 0 23627 1622478 727951 1767 0 0 python3 [59842.122111] [24292] 0 24292 81438 5687 106 0 0 Xvfb [59842.122128] [24373] 1000 24373 2493 512 9 0 0 docker-compose [59842.122138] [24378] 1000 24378 220369 6054 81 0 0 docker-compose [59842.122158] [24427] 0 24427 67428 783 23 0 -500 docker-proxy [59842.122171] [24441] 0 24441 151702 1166 38 0 -500 docker-proxy [59842.122175] [24449] 0 24449 1872 907 8 0 -999 docker-containe [59842.122179] [24470] 0 24470 7457 341 18 0 0 haproxy-systemd [59842.122183] [24549] 0 24549 9713 1026 24 0 0 haproxy [59842.122188] [24555] 0 24555 9944 869 21 0 0 haproxy [59842.122192] [24905] 0 24905 39834 1502 81 0 0 sshd [59842.122195] [24933] 1000 24933 39834 599 78 0 0 sshd [59842.122199] [24936] 1000 24936 29048 739 12 0 0 bash [59842.122205] [25466] 1000 25466 28319 437 12 0 0 bash [59842.122221] [25823] 1000 25823 40571 643 35 0 0 top [59842.122236] [26475] 1000 26475 26998 155 10 0 0 tail [59842.122245] [27276] 0 27276 39834 1502 79 0 0 sshd [59842.122251] [27304] 1000 27304 39834 599 77 0 0 sshd [59842.122254] [27305] 1000 27305 29221 904 15 0 0 bash [59842.122258] [27544] 1000 27544 529617 5114 118 0 0 docker [59842.122274] [32465] 0 32465 1872 863 8 0 -999 docker-containe [59842.122278] [32510] 0 32510 1181116 278253 865 0 0 python3 [59842.122282] [32698] 0 32698 81438 5687 109 0 0 Xvfb [59842.122289] [32765] 0 32765 2224 776 9 0 -999 docker-containe [59842.122292] [ 352] 0 352 1217681 303957 924 0 0 python3 [59842.122297] [ 512] 0 512 81438 5686 108 0 0 Xvfb [59842.122303] [ 1166] 0 1166 2288 798 9 0 -999 docker-containe [59842.122307] [ 1199] 0 1199 577449 24780 246 0 0 python3 [59842.122315] [ 1397] 0 1397 81438 5687 109 0 0 Xvfb [59842.122324] [ 1944] 0 1944 2256 744 9 0 -999 docker-containe [59842.122328] [ 2045] 0 2045 1015782 98157 520 0 0 python3 [59842.122332] [ 2162] 0 2162 81438 5686 110 0 0 Xvfb [59842.122336] [ 2385] 1001 2385 28295 305 13 0 0 sh [59842.122341] [ 2386] 1001 2386 40537 618 35 0 0 top [59842.122345] [ 2410] 1000 2410 26988 90 10 0 0 sleep [59842.122349] [ 2414] 1000 2414 151104 4174 58 0 0 docker [59842.122353] [ 2447] 0 2447 11156 260 20 0 0 systemd-udevd [59842.122356] Kernel panic - not syncing: Out of memory: compulsory panic_on_oom is enabled

[59842.122435] CPU: 12 PID: 23627 Comm: python3 Kdump: loaded Tainted: G ------------ T 3.10.0-957.5.1.el7.x86_64 # 1 [59842.122509] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [59842.122543] Call Trace: [59842.122571] [] dump_stack+0x19/0x1b [59842.122603] [] panic+0xe8/0x21f [59842.122642] [] check_panic_on_oom+0x55/0x60 [59842.122686] [] mem_cgroup_oom_synchronize+0x34d/0x570 [59842.122725] [] ? mem_cgroup_charge_common+0xc0/0xc0 [59842.122764] [] pagefault_out_of_memory+0x14/0x90 [59842.122802] [] mm_fault_error+0x6a/0x157 [59842.122836] [] __do_page_fault+0x3c8/0x500 [59842.122880] [] trace_do_page_fault+0x56/0x150 [59842.122927] [] do_async_page_fault+0x22/0xf0 [59842.122963] [] async_page_fault+0x28/0x30

OS : Centos 7.6.1810 Docker version: Docker version 18.06.1-ce Splash Version : 3.2

My maxrss is 2200 , mem_limit is 2800m, memswap_limit is 2800m

Please help me to solve this.

Gallaecio commented 5 years ago

Have you seen https://github.com/scrapinghub/splash/issues/304?

Mideen commented 5 years ago

Yes, I saw that issue and I set my maxrss value is 600 MB lower than the mem_limit and we are using Aquarium setup only to run the Splash, Still we are facing the same issue after few days continuously running.

This kernel panic issue does not occur in the Ubuntu machine. It only occurs on the Centos with VirtualBox VM.

And we are enabling the panic_on_oom property(set value is = 2 ) in virtual memory kernel constants. Since we need to know if any OOM error occurs on production with Kdump.