travis-ci / packer-templates

Templates for Packer!
MIT License
105 stars 54 forks source link

Cassandra doesn't start on Docker Trusty images #504

Closed bogdanap closed 7 years ago

bogdanap commented 7 years ago

Support tickets: https://secure.helpscout.net/conversation/421713077/58824/?folderId=30788 https://secure.helpscout.net/conversation/425642638/59123/?folderId=30784

It seems that Cassandra doesn't start on Docker images anymore. Example build: https://travis-ci.org/bogdanap/travis_production_test/builds/271774793

From the logs it seems like it might stop here:

sudo service cassandra start
/etc/init.d/cassandra: 72: ulimit: error setting limit (Operation not permitted)
/etc/init.d/cassandra: 73: ulimit: error setting limit (Operation not permitted)
bogdanap commented 7 years ago

This excerpt from dmesg makes me believe that cassandra is actually killed by the oom-killer

Open log chunk

dmesg: --- ``` [13629.669931] Task in /docker/3215ee406f572d12ac48be036a0e122af6e8521b8954cd59ec68b714b124003d killed as a result of limit of /docke r/3215ee406f572d12ac48be036a0e122af6e8521b8954cd59ec68b714b124003d [13629.669956] memory: usage 4194304kB, limit 4194304kB, failcnt 23483 [13629.669957] memory+swap: usage 4194304kB, limit 8388608kB, failcnt 0 [13629.669958] kmem: usage 39308kB, limit 9007199254740988kB, failcnt 0 [13629.669959] Memory cgroup stats for /docker/3215ee406f572d12ac48be036a0e122af6e8521b8954cd59ec68b714b124003d: cache:11388KB rss:41 43608KB rss_huge:3825664KB mapped_file:9656KB dirty:260KB writeback:0KB swap:0KB inactive_anon:9880KB active_anon:4144284KB inactive_ file:692KB active_file:120KB unevictable:0KB [13629.670007] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [13629.670276] [91063] 296608 91063 8250 199 22 3 0 0 init [13629.670279] [91149] 296608 91149 1097 38 8 3 0 0 acpid [13629.670281] [91150] 296608 91150 5918 62 17 3 0 0 cron [13629.670283] [91151] 296608 91151 4789 41 14 3 0 0 atd [13629.670285] [91178] 296608 91178 15350 164 34 3 0 0 sshd [13629.670288] [91179] 296608 91179 4929 173 16 3 0 0 irqbalance [13629.670291] [91246] 298608 91246 9870 2344 20 3 0 0 bash [13629.670293] [91290] 296716 91290 286206 115128 275 4 0 0 mysqld [13629.670295] [91336] 296709 91336 45532 107 25 3 0 0 rsyslogd [13629.670297] [91401] 296715 91401 38105 2746 60 3 0 0 postgres [13629.670299] [91425] 296715 91425 38105 336 51 3 0 0 postgres [13629.670301] [91426] 296715 91426 38105 362 53 3 0 0 postgres [13629.670302] [91427] 296715 91427 38105 336 51 3 0 0 postgres [13629.670304] [91428] 296715 91428 38314 501 54 3 0 0 postgres [13629.670306] [91429] 296715 91429 28303 337 49 3 0 0 postgres [13629.670312] [91897] 362142 91897 81353 106 24 3 0 0 memcached [13629.670316] [91947] 296608 91947 6285 143 18 3 0 0 ntpd [13629.670342] [94676] 296721 94676 1116 38 8 3 0 0 cassandra [13629.670361] [95094] 296608 95094 14343 96 29 3 0 0 sudo [13629.670365] [95098] 296608 95098 1116 21 8 3 0 0 sh [13629.670366] [95105] 296608 95105 5097 77 11 3 0 0 find [13629.670368] [95137] 296721 95137 2234897 916340 1843 7 0 0 java [13629.670370] [95138] 296721 95138 5321 38 12 3 0 0 grep [13629.670377] Memory cgroup out of memory: Kill process 95137 (java) score 875 or sacrifice child [13629.674747] Killed process 95137 (java) total-vm:8939588kB, anon-rss:3665360kB, file-rss:0kB, shmem-rss:0kB [13631.173511] java invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 [13631.173517] java cpuset=3215ee406f572d12ac48be036a0e122af6e8521b8954cd59ec68b714b124003d mems_allowed=0-1 [13631.173523] CPU: 16 PID: 95467 Comm: java Not tainted 4.11.6-041106-generic #201706170517 [13631.173524] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 [13631.173526] Call Trace: [13631.173535] dump_stack+0x63/0x81 [13631.173538] dump_header+0x97/0x21a [13631.173541] oom_kill_process+0x208/0x3e0 [13631.173543] out_of_memory+0x11d/0x4c0 [13631.173547] mem_cgroup_out_of_memory+0x4b/0x80 [13631.173549] mem_cgroup_oom_synchronize+0x31e/0x340 [13631.173551] ? memory_high_write+0xe0/0xe0 [13631.173553] pagefault_out_of_memory+0x36/0x80 [13631.173556] mm_fault_error+0x8f/0x190 [13631.173558] __do_page_fault+0x4ad/0x4e0 [13631.173560] do_page_fault+0x22/0x30 [13631.173564] page_fault+0x28/0x30 [13631.173566] RIP: 0033:0x7f6bfbda5e88 [13631.173567] RSP: 002b:00007f6bfd088680 EFLAGS: 00010287 [13631.173568] RAX: 0000000000001000 RBX: 000000069f313000 RCX: 00007f6bfc5754fa [13631.173570] RDX: 00007f6bfc44bd40 RSI: 00000007c0000000 RDI: 0000000640000000 [13631.173570] RBP: 00007f6bfd088690 R08: 00000000ffffffff R09: 0000000000000000 [13631.173571] R10: 0000000000000032 R11: 0000000000000217 R12: 00000007c0000000 [13631.173572] R13: 0000000000000000 R14: 0000000640000000 R15: 0000000180000000 [13631.173574] Task in /docker/3215ee406f572d12ac48be036a0e122af6e8521b8954cd59ec68b714b124003d killed as a result of limit of /docke r/3215ee406f572d12ac48be036a0e122af6e8521b8954cd59ec68b714b124003d [13631.173580] memory: usage 4194304kB, limit 4194304kB, failcnt 26896 [13631.173581] memory+swap: usage 4194304kB, limit 8388608kB, failcnt 0 [13631.173582] kmem: usage 39368kB, limit 9007199254740988kB, failcnt 0 [13631.173582] Memory cgroup stats for /docker/3215ee406f572d12ac48be036a0e122af6e8521b8954cd59ec68b714b124003d: cache:11496KB rss:4143440KB rss_huge:4024320KB mapped_file:9876KB dirty:260KB writeback:0KB swap:0KB inactive_anon:9884KB active_anon:4144116KB inactive_file:524KB active_file:396KB unevictable:0KB [13631.173617] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [13631.173876] [91063] 296608 91063 8250 198 22 3 0 0 init [13631.173879] [91149] 296608 91149 1097 38 8 3 0 0 acpid [13631.173882] [91150] 296608 91150 5918 62 17 3 0 0 cron [13631.173884] [91151] 296608 91151 4789 41 14 3 0 0 atd [13631.173886] [91178] 296608 91178 15350 164 34 3 0 0 sshd [13631.173888] [91179] 296608 91179 4929 173 16 3 0 0 irqbalance [13631.173892] [91246] 298608 91246 9870 2344 20 3 0 0 bash [13631.173894] [91290] 296716 91290 286206 115128 275 4 0 0 mysqld [13631.173896] [91336] 296709 91336 45532 107 25 3 0 0 rsyslogd [13631.173898] [91401] 296715 91401 38105 2746 60 3 0 0 postgres [13631.173900] [91425] 296715 91425 38105 336 51 3 0 0 postgres [13631.173902] [91426] 296715 91426 38105 362 53 3 0 0 postgres [13631.173905] [91427] 296715 91427 38105 336 51 3 0 0 postgres [13631.173906] [91428] 296715 91428 38314 501 54 3 0 0 postgres [13631.173908] [91429] 296715 91429 28303 337 49 3 0 0 postgres [13631.173914] [91897] 362142 91897 81353 106 24 3 0 0 memcached [13631.173918] [91947] 296608 91947 6285 142 18 3 0 0 ntpd [13631.173965] [95094] 296608 95094 14343 96 29 3 0 0 sudo [13631.173968] [95098] 296608 95098 1116 21 8 3 0 0 sh [13631.173970] [95105] 296608 95105 5207 201 11 3 0 0 find [13631.173973] [95456] 296721 95456 2234898 916276 1843 7 0 0 java [13631.173981] Memory cgroup out of memory: Kill process 95456 (java) score 875 or sacrifice child [13631.178272] Killed process 95456 (java) total-vm:8939592kB, anon-rss:3665104kB, file-rss:0kB, shmem-rss:0kB [13631.198826] oom_reaper: reaped process 95456 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ``` ---


This means we might want to tuneMAX_HEAP_SIZE and HEAP_NEWSIZE variables as explained here

bogdanap commented 7 years ago

Since Cassandra memory's footprint is quite high (the minimum recommended for a production system is 8GB) and our containers only have 4GB of memory, we decided it's better to only support this on GCE. The docs have been updated accordingly: https://docs.travis-ci.com/user/database-setup/#Cassandra