Closed jakubgs closed 3 years ago
It doesn't appear like an image update triggered by Watchtower because the container has existed for more than 2 weeks:
admin@mail-02.do-ams3.eth.prod:~ % docker ps
CONTAINER ID NAMES IMAGE CREATED STATUS
40553f753762 statusd-mail statusteam/status-go:deploy-prod 2 weeks ago Up 8 hours
840421b6fd64 statusd-mail-db postgres:9.6-alpine 2 weeks ago Up 11 days
As we can see the PostgreSQL container was not affected.
The logs on mail-02.do-ams3.eth.prod
look like this:
INFO [10-06|03:01:46.977] received new envelope peer=0x40a8cfd21a3a2a45ff6b18c6db3ba5dfe4acd8d4be6c17e84554d648365e7547 hash=0x151c7715a3d94d40de111223a8e0009e50b7d2d9fb0eae60857dc05d5f10fd41
INFO [10-06|03:01:47.025] settings options peerID=0x1aad8b5be20513ad5766b3afb1a0e7c40a9b90a9a0f61f21118b05df21fc5e21 Options="{PoWRequirement:0xc012f42810 BloomFilter:[255 255 255 255 255 255 255 255 255
INFO [10-06|03:01:47.027] handling peer id=0x34f766b58ca7f8ea197bfd2ffb377c474acb4b254cff4ad693a5b403e82001df ip=178.128.141.249
WARN [10-06|03:01:47.098] failed to deliver envelopes peer=891438aea1fca794 error="write tcp 172.17.0.3:43186->35.239.5.162:30305: use of closed network connection"
INFO [10-06|03:01:47.128] settings options peerID=0x34f766b58ca7f8ea197bfd2ffb377c474acb4b254cff4ad693a5b403e82001df Options="{PoWRequirement:0xc014608ab8 BloomFilter:[255 255 255 255 255 255 255 255 255
INFO [10-06|03:01:47.129] message loop peer=891438aea1fca794 err="p2p: read or write on closed message pipe"
INFO [10-06|03:01:47.269] received new envelopes count=1 peer=0xfd5eed8a28b7ca5d77fb305b52bc1f0ffebbc34b6386334aad5d4c5fd03d839d ip=35.223.240.17
INFO [10-06|03:01:47.271] received new envelope peer=0xfd5eed8a28b7ca5d77fb305b52bc1f0ffebbc34b6386334aad5d4c5fd03d839d hash=0x151c7715a3d94d40de111223a8e0009e50b7d2d9fb0eae60857dc05d5f10fd41
WARN [10-06|03:01:47.391] failed to deliver envelopes peer=98ab721a4762645a error="shutting down"
WARN [10-06|03:01:48.701] oversize message received peer="[36 186 242 213 107 222 203 98 106 38 94 142 216 14 22 25 92 1 196 149 235 230 195 95 204 20 195 229 60 23 142 7]" size=7233488
INFO [10-06|03:01:48.874] message loop peer=24baf2d56bdecb62 err="p2p: read or write on closed message pipe"
WARN [10-06|03:01:48.964] failed to deliver envelopes peer=24baf2d56bdecb62 error="shutting down"
WARN [10-06|03:01:49.064] oversize message received peer="[26 173 139 91 226 5 19 173 87 102 179 175 177 160 231 196 10 155 144 169 160 246 31 33 17 139 5 223 33 252 94 33]" size=6677152
WARN [10-06|03:01:49.084] failed to deliver envelopes peer=1aad8b5be20513ad error="shutting down"
INFO [10-06|03:01:49.087] message loop peer=1aad8b5be20513ad err="p2p: read or write on closed message pipe"
INFO [10-06|03:01:49.594] message loop peer=34f766b58ca7f8ea err="p2p: read or write on closed message pipe"
INFO [10-06|03:01:49.595] failed to read a message peer="[52 247 102 181 140 167 248 234 25 123 253 47 251 55 124 71 74 203 75 37 76 255 74 214 147 165 180 3 232 32 1 223]" error="p2p: read or write on closed mes
WARN [10-06|03:01:49.810] failed to deliver envelopes peer=34f766b58ca7f8ea error="write tcp 172.17.0.3:45150->178.128.141.249:30305: write: broken pipe"
INFO [10-06|03:01:49.846] handling peer id=0xcbff66ca3e3e6f9e21185aa2a7b8a7a46e98cad7f0a6222465bd96be05b82c28 ip=47.75.85.212
INFO [10-06|03:01:51.806] received new envelopes count=1 peer=0x0e969556ca4bf2a0266aed93582ea55f01c09f2431fe84d22d2b7acdf2acb592 ip=178.128.141.87
INFO [10-06|03:01:51.816] received new envelope peer=0x0e969556ca4bf2a0266aed93582ea55f01c09f2431fe84d22d2b7acdf2acb592 hash=0xe919c0d0af26a95999136e3e47df8295ca7b93cd624e16b37532cc358deddd64
INFO [10-06|03:01:51.842] received new envelopes count=1 peer=0x40a8cfd21a3a2a45ff6b18c6db3ba5dfe4acd8d4be6c17e84554d648365e7547 ip=134.209.136.79
INFO [10-06|03:01:51.844] received new envelope peer=0x40a8cfd21a3a2a45ff6b18c6db3ba5dfe4acd8d4be6c17e84554d648365e7547 hash=0xe919c0d0af26a95999136e3e47df8295ca7b93cd624e16b37532cc358deddd64
INFO [10-06|03:01:55.330] Status backend initialized backend=geth version=0.61.2 commit=845980535
INFO [10-06|03:01:55.365] LES protocol is disabled package=status-go/node
INFO [10-06|03:01:55.365] Status service api is disabled package=status-go/node
INFO [10-06|03:01:55.366] Starting peer-to-peer node instance=Statusd/v0.61.2/linux-amd64/go1.13.8
INFO [10-06|03:01:55.370] Connecting to postgres database
INFO [10-06|03:01:58.199] Connected to postgres database
INFO [10-06|03:01:58.199] Starting cleaning envelopes period=1h0m0s retention=720h0m0s
INFO [10-06|03:01:58.276] Connecting to postgres database
INFO [10-06|03:02:04.919] Connected to postgres database
INFO [10-06|03:02:04.920] Starting cleaning envelopes period=1h0m0s retention=720h0m0s
INFO [10-06|03:02:05.023] started whisper v.6.0
INFO [10-06|03:02:05.031] New local node record seq=76 id=7d646a1b74af5f29 ip=127.0.0.1 udp=0 tcp=30504
INFO [10-06|03:02:05.033] Started P2P networking self="enode://7aa648d6e855950b2e3d3bf220c496e0cae4adfddef3e1e6062e6b177aec93bc6cdcf1282cb40d1656932ebfdd565729da440368d7c4da7dbd4d004b1ac02bf8@127.0.0.1:30504?discport=0"
INFO [10-06|03:02:05.028] IPC endpoint opened url=/data/geth.ipc
WARN [10-06|03:02:05.039] Sanitizing invalid HTTP read timeout provided=0s updated=30s
WARN [10-06|03:02:05.040] Sanitizing invalid HTTP write timeout provided=0s updated=30s
WARN [10-06|03:02:05.040] Sanitizing invalid HTTP idle timeout provided=0s updated=2m0s
INFO [10-06|03:02:05.043] HTTP endpoint opened url=http://0.0.0.0:8545 cors= vhosts=localhost
INFO [10-06|03:02:05.043] setup a Whisper-Waku bridge successfully
INFO [10-06|03:02:05.044] Handlers registered package=status-go/api.GethStatusBackend
INFO [10-06|03:02:05.044] Using AdvertiseAddr for rendezvous package=status-go/node.StatusNode addr=178.128.142.26
INFO [10-06|03:02:05.047] UDP listener up net=enode://7aa648d6e855950b2e3d3bf220c496e0cae4adfddef3e1e6062e6b177aec93bc6cdcf1282cb40d1656932ebfdd565729da440368d7c4da7dbd4d004b1ac02bf8@[::]:30504
INFO [10-06|03:02:05.092] Difference with ntp servers offset=-9.702438ms
It appears to have died at 03:01:51.844
and then started again at 03:01:55.330
four seconds later.
This has not happened on all hosts:
> ansible mail -m shell -o -a 'grep "Status backend initialized" /var/log/docker/statusd-mail/docker.log > /dev/null && echo PRESENT || echo ABSENT'
mail-02.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) PRESENT
mail-01.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) PRESENT
mail-03.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) ABSENT
mail-01.gc-us-central1-a.eth.prod | CHANGED | rc=0 | (stdout) ABSENT
mail-03.gc-us-central1-a.eth.prod | CHANGED | rc=0 | (stdout) PRESENT
mail-02.gc-us-central1-a.eth.prod | CHANGED | rc=0 | (stdout) PRESENT
mail-01.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) PRESENT
mail-03.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) ABSENT
mail-02.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) ABSENT
Here are some graphs from mail-01.do-ams3.eth.prod
:
Container CPU and Memory usage clearly show a spike for the statusd-mail
container:
And it appears it was killed due to memory usage on mail-02.do-ams3.eth.prod
:
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685433] statusd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685437] CPU: 0 PID: 169572 Comm: statusd Not tainted 5.4.0-47-generic #51-Ubuntu
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685438] Hardware name: DigitalOcean Droplet, BIOS 20171212 12/12/2017
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685439] Call Trace:
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685448] dump_stack+0x6d/0x9a
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685465] dump_header+0x4f/0x1eb
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685467] oom_kill_process.cold+0xb/0x10
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685468] out_of_memory.part.0+0x1df/0x3d0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685470] out_of_memory+0x6d/0xd0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685473] mem_cgroup_out_of_memory+0xbd/0xe0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685475] try_charge+0x77c/0x810
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685480] ? __alloc_pages_nodemask+0x173/0x320
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685482] mem_cgroup_try_charge+0x71/0x190
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685484] mem_cgroup_try_charge_delay+0x22/0x50
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685487] do_anonymous_page+0x13c/0x650
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685489] __handle_mm_fault+0x760/0x7a0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685491] handle_mm_fault+0xca/0x200
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685499] do_user_addr_fault+0x1f9/0x450
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685501] __do_page_fault+0x58/0x90
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685505] ? exit_to_usermode_loop+0x8f/0x160
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685507] do_page_fault+0x2c/0xe0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685511] do_async_page_fault+0x39/0x70
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685516] async_page_fault+0x34/0x40
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685518] RIP: 0033:0x481f03
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685520] Code: 00 48 81 eb 00 01 00 00 48 81 c7 00 01 00 00 48 81 fb 00 01 00 00 73 82 e9 07 ff ff ff c5 fd ef c0 48 81 fb 00 00 00 02 73 46 <c5> fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 48 81 eb
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685521] RSP: 002b:00007ff13eb32970 EFLAGS: 00010202
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685523] RAX: 0000000000000000 RBX: 0000000000003000 RCX: 000000c019ecc000
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685523] RDX: 00000000026bc0c0 RSI: 0000000000000040 RDI: 000000c019f51000
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685524] RBP: 00007ff13eb329b8 R08: 000000c019ecc000 R09: 0000000000000044
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685525] R10: 00007ff13db61ea8 R11: 0000000000000fa9 R12: 0000000000000043
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685525] R13: 0000000000000000 R14: 0000000001b0c920 R15: 0000000000000000
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685540] memory: usage 1423360kB, limit 1423360kB, failcnt 30678
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685541] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685541] kmem: usage 4960kB, limit 9007199254740988kB, failcnt 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.685542] Memory cgroup stats for /docker/40553f753762b847898d45868683a2b70939f3c1b2c8bd60b677195489bbca45:
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] anon 1452310528
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] file 163840
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] kernel_stack 184320
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] slab 1536000
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] sock 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] shmem 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] file_mapped 135168
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] file_dirty 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] file_writeback 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] anon_thp 41943040
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] inactive_anon 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] active_anon 1452216320
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] inactive_file 98304
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] active_file 45056
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] unevictable 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] slab_reclaimable 974848
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] slab_unreclaimable 561152
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] pgfault 1205523
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] pgmajfault 10098
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] workingset_refault 199386
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] workingset_activate 107943
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] workingset_nodereclaim 0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] pgrefill 1355261
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] pgscan 3065911
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] pgsteal 1043413
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] pgactivate 1388079
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686109] pgdeactivate 1256287
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686110] Tasks state (memory values in pages):
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686111] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686115] [ 169540] 100000 169540 427003 354338 3305472 0 0 statusd
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686117] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=40553f753762b847898d45868683a2b70939f3c1b2c8bd60b677195489bbca45,mems_allowed=0,oom_memcg=/docker/40553f753762b847898d45868683a2b70939f3c1b2c8bd60b677195489bbca45,task_memcg=/docker/40553f753762b847898d45868683a2b70939f3c1b2c8bd60b677195489bbca45,task=statusd,pid=169540,uid=100000
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.686148] Memory cgroup out of memory: Killed process 169540 (statusd) total-vm:1708012kB, anon-rss:1417352kB, file-rss:0kB, shmem-rss:0kB, UID:100000 pgtables:3228kB oom_score_adj:0
Oct 6 03:01:53 mail-02.do-ams3.eth.prod kernel: [1000887.807490] oom_reaper: reaped process 169540 (statusd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Looks like we had a major spike in envelopes at that time:
This is essentially a doubling of our normal unique envelope rates.
It's likely that this was an attempt at a spam attack. @cammellos suggested it might have been due to notifications but unlikely.
Closing since this is quite old and we didn't see a re-occurence.
We had a spike in activity today around 3AM UTC and multiple containers restarting.