m-lab / scraper

Scrape experiment data off of MLab nodes and upload it to Google Cloud Storage
Apache License 2.0
5 stars 5 forks source link

CoreServices_SidestreamIsNotRunning #305

Closed measurementlab closed 6 years ago

measurementlab commented 6 years ago

Alertmanager URL: https://mlab:YOztKFSKnRMz2GN1qFPueAku9WhmDYV2@alertmanager.mlab-oti.measurementlab.net

TODO: add graph url from annotations.

critzo commented 6 years ago

This is happening on mlab1-3.hnd02. Last 100 lines of /var/log/messages from mlab1, iupui_npad slice are logged below, suggesting the issue is related to our rsync process. Restarting the sliver.

[site_admin@mlab1 ~]$ sudo vserver iupui_npad enter
bash-4.1# tail -100 /var/log/messages | more
May 10 13:56:16 mlab1 rsyncd[31499]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 13:57:16 mlab1 rsyncd[32116]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 13:57:16 mlab1 rsyncd[32116]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 13:58:16 mlab1 rsyncd[587]: connect from 65.135.224.35.bc.googleuserconte
nt.com (35.224.135.65)
May 10 13:58:17 mlab1 rsyncd[587]: module-list request from 65.135.224.35.bc.goo
gleusercontent.com (35.224.135.65)
May 10 13:59:16 mlab1 rsyncd[1334]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 13:59:17 mlab1 rsyncd[1334]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:00:16 mlab1 rsyncd[2046]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:00:16 mlab1 rsyncd[2046]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:01:16 mlab1 rsyncd[2785]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:01:16 mlab1 rsyncd[2785]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:02:16 mlab1 rsyncd[3573]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:02:16 mlab1 rsyncd[3573]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:03:16 mlab1 rsyncd[4105]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:03:16 mlab1 rsyncd[4105]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:04:16 mlab1 rsyncd[5028]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:04:16 mlab1 rsyncd[5028]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:04:17 mlab1 rsyncd[5029]: forward name lookup for 19.166.193.35.bc.goo
gleusercontent.com failed: Name or service not known
May 10 14:04:17 mlab1 rsyncd[5029]: connect from UNKNOWN (35.193.166.19)
May 10 14:04:17 mlab1 rsyncd[5029]: rsync on npad/ from unknown (35.193.166.19)
May 10 14:04:17 mlab1 rsyncd[5029]: building file list
May 10 14:04:18 mlab1 rsyncd[5029]: sent 73 bytes  received 30 bytes  total size
 0
May 10 14:05:17 mlab1 rsyncd[6281]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:05:17 mlab1 rsyncd[6281]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:06:16 mlab1 rsyncd[6866]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:06:16 mlab1 rsyncd[6866]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:07:16 mlab1 rsyncd[7468]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:07:16 mlab1 rsyncd[7468]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:08:16 mlab1 rsyncd[8149]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:08:16 mlab1 rsyncd[8149]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:08:41 mlab1 rsyncd[8358]: forward name lookup for 19.166.193.35.bc.goo
gleusercontent.com failed: Name or service not known
May 10 14:08:41 mlab1 rsyncd[8358]: connect from UNKNOWN (35.193.166.19)
May 10 14:08:41 mlab1 rsyncd[8358]: rsync on paris-traceroute/ from unknown (35.
193.166.19)
May 10 14:08:42 mlab1 rsyncd[8358]: building file list
May 10 14:08:51 mlab1 rsyncd[8358]: sent 2703701 bytes  received 118055 bytes  t
otal size 40836599
May 10 14:08:52 mlab1 rsyncd[8429]: forward name lookup for 19.166.193.35.bc.goo
gleusercontent.com failed: Name or service not known
May 10 14:08:52 mlab1 rsyncd[8429]: connect from UNKNOWN (35.193.166.19)
May 10 14:08:52 mlab1 rsyncd[8429]: rsync on paris-traceroute/ from unknown (35.
193.166.19)
May 10 14:08:52 mlab1 rsyncd[8429]: building file list
May 10 14:08:54 mlab1 rsyncd[8429]: sent 165358 bytes  received 6065 bytes  tota
l size 281373
May 10 14:09:16 mlab1 rsyncd[8650]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:09:16 mlab1 rsyncd[8650]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:10:16 mlab1 rsyncd[9713]: connect from 65.135.224.35.bc.googleusercont
ent.com (35.224.135.65)
May 10 14:10:17 mlab1 rsyncd[9713]: module-list request from 65.135.224.35.bc.go
ogleusercontent.com (35.224.135.65)
May 10 14:11:16 mlab1 rsyncd[10335]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:11:16 mlab1 rsyncd[10335]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:12:16 mlab1 rsyncd[10942]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:12:16 mlab1 rsyncd[10942]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:12:57 mlab1 rsyncd[11263]: forward name lookup for 19.166.193.35.bc.go
ogleusercontent.com failed: Name or service not known
May 10 14:12:57 mlab1 rsyncd[11263]: connect from UNKNOWN (35.193.166.19)
May 10 14:12:58 mlab1 rsyncd[11263]: rsync on npad/ from unknown (35.193.166.19)
May 10 14:12:58 mlab1 rsyncd[11263]: building file list
May 10 14:12:59 mlab1 rsyncd[11263]: sent 73 bytes  received 30 bytes  total siz
e 0
May 10 14:13:03 mlab1 rsyncd[11342]: connect from 47.54.188.35.bc.googleusercont
ent.com (35.188.54.47)
May 10 14:13:03 mlab1 rsyncd[11342]: rsync on sidestream/ from 47.54.188.35.bc.g
oogleusercontent.com (35.188.54.47)
May 10 14:13:03 mlab1 rsyncd[11342]: building file list
May 10 14:13:05 mlab1 rsyncd[11342]: sent 21347 bytes  received 1475 bytes  tota
l size 123878883
May 10 14:13:05 mlab1 rsyncd[11345]: connect from 47.54.188.35.bc.googleusercont
ent.com (35.188.54.47)
May 10 14:13:05 mlab1 rsyncd[11345]: rsync on sidestream/ from 47.54.188.35.bc.g
oogleusercontent.com (35.188.54.47)
May 10 14:13:06 mlab1 rsyncd[11345]: building file list
May 10 14:13:09 mlab1 rsyncd[11345]: sent 1344844 bytes  received 431 bytes  tot
al size 5339498
May 10 14:13:16 mlab1 rsyncd[11424]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:13:16 mlab1 rsyncd[11424]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:14:16 mlab1 rsyncd[12051]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:14:16 mlab1 rsyncd[12051]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:15:16 mlab1 rsyncd[12900]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:15:16 mlab1 rsyncd[12900]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:15:17 mlab1 rsyncd[12904]: forward name lookup for 19.166.193.35.bc.go
ogleusercontent.com failed: Name or service not known
May 10 14:15:17 mlab1 rsyncd[12904]: connect from UNKNOWN (35.193.166.19)
May 10 14:15:18 mlab1 rsyncd[12904]: rsync on npad/ from unknown (35.193.166.19)
May 10 14:15:18 mlab1 rsyncd[12904]: building file list
May 10 14:15:19 mlab1 rsyncd[12904]: sent 73 bytes  received 30 bytes  total siz
e 0
May 10 14:16:16 mlab1 rsyncd[13512]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:16:16 mlab1 rsyncd[13512]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:17:16 mlab1 rsyncd[14349]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:17:16 mlab1 rsyncd[14349]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:18:16 mlab1 rsyncd[14955]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:18:16 mlab1 rsyncd[14955]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:19:16 mlab1 rsyncd[15473]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:19:16 mlab1 rsyncd[15473]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:20:16 mlab1 rsyncd[16244]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:20:16 mlab1 rsyncd[16244]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:21:16 mlab1 rsyncd[16779]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:21:16 mlab1 rsyncd[16779]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:22:16 mlab1 rsyncd[17565]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:22:16 mlab1 rsyncd[17565]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:23:16 mlab1 rsyncd[18107]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:23:16 mlab1 rsyncd[18107]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:24:16 mlab1 rsyncd[18607]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:24:16 mlab1 rsyncd[18607]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:25:16 mlab1 rsyncd[19241]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:25:16 mlab1 rsyncd[19241]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:26:16 mlab1 rsyncd[19748]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:26:16 mlab1 rsyncd[19748]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:27:16 mlab1 rsyncd[20212]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:27:16 mlab1 rsyncd[20212]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:28:16 mlab1 rsyncd[20741]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:28:16 mlab1 rsyncd[20741]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
May 10 14:29:16 mlab1 rsyncd[21505]: connect from 65.135.224.35.bc.googleusercon
tent.com (35.224.135.65)
May 10 14:29:17 mlab1 rsyncd[21505]: module-list request from 65.135.224.35.bc.g
oogleusercontent.com (35.224.135.65)
critzo commented 6 years ago

mlab2.hnd02 reports many OOM errors:

May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(4955:511) score 1
 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process logger.py(4956:511) sc
ore 1 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process fakewww(4971:511) scor
e 1 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process python(4981:511) score
 1 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process inotify_exporte(4986:5
11) score 1 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process rsync(21387:511) score
 1 or sacrifice child
May 30 11:44:44 mlab2 kernel: Out of memory: Kill process ndtd(10557:511) score 
1 or sacrifice child
stephen-soltesz commented 6 years ago

Processed killed due to overload at HND site.