z25 / pyZOCP

Python ZOCP implementation (Z25 Orchestration Control Protocol)
GNU Lesser General Public License v3.0
33 stars 5 forks source link

Very log run issues #79

Open sphaero opened 9 years ago

sphaero commented 9 years ago

No idea how this is related to any issues in Pyre or ZOCP but at least it shows where we might add some conditionals. This is from running 5 threads(actor/nodes) on a ubuntu linux machine:

Peer None isn't ready
Thread4: fps: 0.05272019373681729
Peer None isn't ready
Thread3: fps: 0.6297142004372849
Thread2: fps: 0.04867682388059979
Peer None isn't ready
Peer None isn't ready
Peer None isn't ready
Peer None isn't ready
Peer <pyre.pyre_peer.PyrePeer object at 0x7f660a940940> isn't ready
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/home/people/arnaud/src/pyre/pyre/zactor.py", line 57, in run
    self.shim_handler(*self.shim_args, **self.shim_kwargs)
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 52, in __init__
    self.run()
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 503, in run
    self.recv_peer()
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 359, in recv_peer
    zmsg.recv(self.inbox)
  File "/home/people/arnaud/src/pyre/pyre/zre_msg.py", line 74, in recv
    self.address = uuid.UUID(bytes=self.address[1:])
  File "/usr/lib/python3.4/uuid.py", line 148, in __init__
    raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/home/people/arnaud/src/pyre/pyre/zactor.py", line 57, in run
    self.shim_handler(*self.shim_args, **self.shim_kwargs)
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 52, in __init__
    self.run()
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 503, in run
    self.recv_peer()
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 359, in recv_peer
    zmsg.recv(self.inbox)
  File "/home/people/arnaud/src/pyre/pyre/zre_msg.py", line 74, in recv
    self.address = uuid.UUID(bytes=self.address[1:])
  File "/usr/lib/python3.4/uuid.py", line 148, in __init__
    raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/home/people/arnaud/src/pyre/pyre/zactor.py", line 57, in run
    self.shim_handler(*self.shim_args, **self.shim_kwargs)
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 52, in __init__
    self.run()
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 503, in run
    self.recv_peer()
  File "/home/people/arnaud/src/pyre/pyre/pyre_node.py", line 359, in recv_peer
    zmsg.recv(self.inbox)
  File "/home/people/arnaud/src/pyre/pyre/zre_msg.py", line 74, in recv
    self.address = uuid.UUID(bytes=self.address[1:])
  File "/usr/lib/python3.4/uuid.py", line 148, in __init__
    raise ValueError('bytes is not a 16-char string')
ValueError: bytes is not a 16-char string

I don't know what happened but somewhere (either before or after the error messages) it starts leaking memory:

[112950.723732] python3 invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[112950.723740] python3 cpuset=session-c2.scope mems_allowed=0
[112950.723750] CPU: 4 PID: 29085 Comm: python3 Tainted: P           OE  3.19.0-18-generic #18-Ubuntu
[112950.723754] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890GM Pro3 R2.0, BIOS P1.50 10/04/2011
[112950.723757]  0000000000000000 ffff8802df957898 ffffffff817c27cd 0000000000000007
[112950.723764]  ffff880408a26bf0 ffff8802df957918 ffffffff817c0543 0000000000000000
[112950.723770]  0000000000000000 0000000000000000 0000000000000000 ffff880409d6bae0
[112950.723776] Call Trace:
[112950.723787]  [<ffffffff817c27cd>] dump_stack+0x45/0x57
[112950.723794]  [<ffffffff817c0543>] dump_header+0x7f/0x1e7
[112950.723803]  [<ffffffff8117d07b>] oom_kill_process+0x22b/0x390
[112950.723811]  [<ffffffff8107ea9e>] ? has_capability_noaudit+0x1e/0x30
[112950.723817]  [<ffffffff8117d5ed>] out_of_memory+0x24d/0x500
[112950.723822]  [<ffffffff8118353a>] __alloc_pages_nodemask+0xaba/0xba0
[112950.723830]  [<ffffffff811c9c61>] alloc_pages_current+0x91/0x110
[112950.723835]  [<ffffffff81179597>] __page_cache_alloc+0xa7/0xd0
[112950.723840]  [<ffffffff8117bcaf>] filemap_fault+0x1af/0x400
[112950.723845]  [<ffffffff811a683d>] __do_fault+0x3d/0xc0
[112950.723850]  [<ffffffff811a90ef>] do_read_fault.isra.55+0x1df/0x2f0
[112950.723856]  [<ffffffff811aafbe>] handle_mm_fault+0x86e/0xff0
[112950.723861]  [<ffffffff810ef518>] ? get_futex_key+0x238/0x2b0
[112950.723867]  [<ffffffff810ef7d1>] ? futex_wake+0x71/0x140
[112950.723872]  [<ffffffff81062bdd>] __do_page_fault+0x1dd/0x5b0
[112950.723877]  [<ffffffff810f2537>] ? do_futex+0x107/0x5d0
[112950.723883]  [<ffffffff81062fe1>] do_page_fault+0x31/0x70
[112950.723888]  [<ffffffff817cba68>] page_fault+0x28/0x30
[112950.723892] Mem-Info:
[112950.723894] Node 0 DMA per-cpu:
[112950.723898] CPU    0: hi:    0, btch:   1 usd:   0
[112950.723901] CPU    1: hi:    0, btch:   1 usd:   0
[112950.723903] CPU    2: hi:    0, btch:   1 usd:   0
[112950.723905] CPU    3: hi:    0, btch:   1 usd:   0
[112950.723907] CPU    4: hi:    0, btch:   1 usd:   0
[112950.723910] CPU    5: hi:    0, btch:   1 usd:   0
[112950.723912] Node 0 DMA32 per-cpu:
[112950.723915] CPU    0: hi:  186, btch:  31 usd:  46
[112950.723918] CPU    1: hi:  186, btch:  31 usd: 171
[112950.723920] CPU    2: hi:  186, btch:  31 usd:  29
[112950.723923] CPU    3: hi:  186, btch:  31 usd:  24
[112950.723925] CPU    4: hi:  186, btch:  31 usd:  30
[112950.723927] CPU    5: hi:  186, btch:  31 usd:  40
[112950.723929] Node 0 Normal per-cpu:
[112950.723932] CPU    0: hi:  186, btch:  31 usd:   0
[112950.723935] CPU    1: hi:  186, btch:  31 usd: 175
[112950.723937] CPU    2: hi:  186, btch:  31 usd:  43
[112950.723939] CPU    3: hi:  186, btch:  31 usd:  48
[112950.723941] CPU    4: hi:  186, btch:  31 usd:  31
[112950.723944] CPU    5: hi:  186, btch:  31 usd:  44
[112950.723950] active_anon:3574586 inactive_anon:432027 isolated_anon:0
 active_file:203 inactive_file:3054 isolated_file:32
 unevictable:8 dirty:0 writeback:73 unstable:0
 free:33269 slab_reclaimable:7542 slab_unreclaimable:8546
 mapped:4716 shmem:164 pagetables:21254 bounce:0
 free_cma:0
[112950.723957] Node 0 DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[112950.723964] lowmem_reserve[]: 0 3482 16008 16008
[112950.723969] Node 0 DMA32 free:64588kB min:14684kB low:18352kB high:22024kB active_anon:2870400kB inactive_anon:583044kB active_file:44kB inactive_file:2708kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3647556kB managed:3567688kB mlocked:0kB dirty:0kB writeback:0kB mapped:2472kB shmem:0kB slab_reclaimable:5464kB slab_unreclaimable:5612kB kernel_stack:1328kB pagetables:21944kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:217172 all_unreclaimable? yes
[112950.723977] lowmem_reserve[]: 0 0 12526 12526
[112950.723982] Node 0 Normal free:52588kB min:52832kB low:66040kB high:79248kB active_anon:11428140kB inactive_anon:1144808kB active_file:768kB inactive_file:9508kB unevictable:32kB isolated(anon):0kB isolated(file):128kB present:13090812kB managed:12827288kB mlocked:32kB dirty:0kB writeback:292kB mapped:16392kB shmem:656kB slab_reclaimable:24704kB slab_unreclaimable:28572kB kernel_stack:5648kB pagetables:63072kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:288820 all_unreclaimable? yes
[112950.723990] lowmem_reserve[]: 0 0 0 0
[112950.723995] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15900kB
[112950.724064] Node 0 DMA32: 160*4kB (UEMR) 124*8kB (UEMR) 145*16kB (UEMR) 123*32kB (UEMR) 64*64kB (EMR) 44*128kB (UEMR) 27*256kB (ER) 14*512kB (UER) 4*1024kB (EM) 6*2048kB (UMR) 4*4096kB (M) = 64464kB
[112950.724091] Node 0 Normal: 381*4kB (E) 296*8kB (UE) 410*16kB (UE) 355*32kB (UEM) 160*64kB (UEM) 71*128kB (UE) 28*256kB (EM) 6*512kB (UEM) 1*1024kB (M) 0*2048kB 0*4096kB = 52404kB
[112950.724118] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[112950.724121] 3752 total pagecache pages
[112950.724124] 93 pages in swap cache
[112950.724127] Swap cache stats: add 4319487, delete 4319394, find 67007/113625
[112950.724130] Free swap  = 0kB
[112950.724132] Total swap = 16752636kB
[112950.724135] 4188588 pages RAM
[112950.724137] 0 pages HighMem/MovableOnly
[112950.724139] 85869 pages reserved
[112950.724141] 0 pages cma reserved
[112950.724143] 0 pages hwpoisoned
[112950.724146] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[112950.724155] [  279]     0   279     8312       15      20       82             0 systemd-journal
[112950.724161] [  291]     0   291     8913        1      17      213         -1000 systemd-udevd
[112950.724167] [  487]     0   487     8259       16      22       76             0 rpcbind
[112950.724173] [  621]     0   621   107886      175      95      474             0 NetworkManager
[112950.724178] [  624]   104   624    80551        0      56      414             0 rsyslogd
[112950.724181] [  632]     0   632     3828        0      11       39             0 cgmanager
[112950.724185] [  649]     0   649    25254       18      52      246             0 systemd-logind
[112950.724189] [  659]     0   659     5650       19      17       43             0 cron
[112950.724192] [  664]   107   664    24663        0      51      268             0 avahi-daemon
[112950.724196] [  666]     0   666    85830       19      68      367             0 accounts-daemon
[112950.724200] [  671]     0   671    84104        0      66      830             0 ModemManager
[112950.724203] [  674]     0   674     4860       24      14       32             0 irqbalance
[112950.724207] [  675]   110   675   109697       40      77      362             0 whoopsie
[112950.724211] [  684]   105   684    27628        1      58      466          -900 dbus-daemon
[112950.724215] [  700]   107   700    24621        0      48      235             0 avahi-daemon
[112950.724218] [  727]     0   727    85256        1      70      702             0 polkitd
[112950.724222] [  728]   117   728    74518        1      48     1100             0 colord
[112950.724226] [  729]     0   729    37446        0      41      756             0 cups-browsed
[112950.724229] [  787]     0   787    13865        1      31      166         -1000 sshd
[112950.724233] [  800]     0   800   102857        5      68      362             0 lightdm
[112950.724237] [  810]     0   810    89560     4073     174    30504             0 Xorg
[112950.724241] [  813]     0   813     5866        0      16     1723             0 dhclient
[112950.724244] [  816] 65534   816     7440       18      18       45             0 dnsmasq
[112950.724248] [ 1031]     0  1031     1099        1       8       43             0 acpid
[112950.724252] [ 1045]     0  1045    36701        1      74      321             0 login
[112950.724256] [ 1132]   114  1132    40656        7      17       45             0 rtkit-daemon
[112950.724259] [ 1162]     0  1162    57006        1      79      380             0 lightdm
[112950.724263] [ 1183]   111  1183     8848       25      21       55             0 kerneloops
[112950.724267] [ 1194] 10002  1194    28788        6      59      330             0 systemd
[112950.724270] [ 1195] 10002  1195    35906        0      68      599             0 (sd-pam)
[112950.724274] [ 1201] 10002  1201    26304       23      55      239             0 i3
[112950.724278] [ 1255] 10002  1255     2687        8       8       74             0 ssh-agent
[112950.724281] [ 1258] 10002  1258    24805        0      48      266             0 dbus-launch
[112950.724285] [ 1259] 10002  1259    27542       93      53      279             0 dbus-daemon
[112950.724288] [ 1276] 10002  1276   106664     3645      72      938             0 ibus-daemon
[112950.724292] [ 1279] 10002  1279    48310        1      30      165             0 gvfsd
[112950.724295] [ 1283] 10002  1283    67422        0      31      184             0 gvfsd-fuse
[112950.724299] [ 1287] 10002  1287    66557        0      32      183             0 ibus-dconf
[112950.724303] [ 1288] 10002  1288   110584        1     103     3075             0 ibus-ui-gtk3
[112950.724306] [ 1291] 10002  1291    68335        1      68      526             0 ibus-x11
[112950.724310] [ 1296] 10002  1296     1117        0       7       22             0 sh
[112950.724314] [ 1299] 10002  1299    98369        1      58      577             0 dunst
[112950.724317] [ 1300] 10002  1300     1117        0       6       23             0 sh
[112950.724321] [ 1301] 10002  1301    66008        1      30      155             0 at-spi-bus-laun
[112950.724325] [ 1303] 10002  1303   119875      167     126     1085             0 nm-applet
[112950.724328] [ 1310] 10002  1310     1118        0       7       24             0 sh
[112950.724332] [ 1312] 10002  1312    20996       25      44      327             0 i3bar
[112950.724335] [ 1316] 10002  1316    27436        1      58      265             0 dbus-daemon
[112950.724339] [ 1317] 10002  1317     1118        0       7       25             0 sh
[112950.724343] [ 1318] 10002  1318     7023       35      20       39             0 i3status
[112950.724346] [ 1327] 10002  1327    24867       24      49      274             0 screen
[112950.724350] [ 1328] 10002  1328     3131       43      11       29             0 syncdir.sh
[112950.724354] [ 1332] 10002  1332    30824        1      29      166             0 at-spi2-registr
[112950.724358] [ 1339] 10002  1339    47624        0      30      230             0 ibus-engine-sim
[112950.724361] [ 1344] 10002  1344    24867       24      47      274             0 screen
[112950.724365] [ 1345] 10002  1345     3132       45      11       29             0 syncdir.sh
[112950.724368] [ 1352] 10002  1352    29334       29      60      284             0 gconfd-2
[112950.724372] [ 1386]   109  1386    25476       28      51      303             0 ntpd
[112950.724376] [ 1395] 10002  1395     3111        0      12       53             0 bash
[112950.724379] [ 1397] 10002  1397   255037     1076     285     4639             0 pidgin
[112950.724383] [ 1501] 10002  1501     3111        0      12       52             0 bash
[112950.724387] [ 1504] 10002  1504   198279     1614     144     2255             0 geany
[112950.724391] [ 1509] 10002  1509    22878        1      48      226             0 gnome-pty-helpe
[112950.724394] [ 1510] 10002  1510    24605        3      51      961             0 bash
[112950.724398] [ 1552] 10002  1552     1118        0       7       24             0 sh
[112950.724402] [ 1553] 10002  1553   140301      645     126     1796             0 x-terminal-emul
[112950.724405] [ 1555] 10002  1555    22878        1      48      226             0 gnome-pty-helpe
[112950.724409] [ 1574] 10002  1574    24864        1      51     1217             0 bash
[112950.724413] [ 1588] 10002  1588    24869        1      52     1221             0 bash
[112950.724416] [ 1611] 10002  1611    24654        1      52     1005             0 bash
[112950.724420] [ 1626] 10002  1626    24872        1      51     1230             0 bash
[112950.724423] [ 1686] 10002  1686     3111        0      12       52             0 bash
[112950.724427] [ 1688] 10002  1688   263974     1213     151     2874             0 geany
[112950.724431] [ 1694] 10002  1694    22878        1      47      227             0 gnome-pty-helpe
[112950.724434] [ 1695] 10002  1695    24605        1      51      962             0 bash
[112950.724438] [ 2703]     0  2703    67483        0      49      320             0 upowerd
[112950.724442] [ 2878] 10002  2878   120212        1      75      796             0 pulseaudio
[112950.724445] [ 6154]     0  6154    20337        0      43      278             0 cupsd
[112950.724449] [ 7130] 10002  7130    28017        0      26      135             0 gvfsd-metadata
[112950.724452] [ 7319] 10002  7319     9393        0      22       89             0 xfconfd
[112950.724456] [ 7327] 10002  7327    87499      204      70      307             0 gvfs-udisks2-vo
[112950.724460] [ 7329]     0  7329    91010        0      44      507             0 udisksd
[112950.724463] [ 7338] 10002  7338    80545        0      46      269             0 gvfs-afc-volume
[112950.724467] [ 7343] 10002  7343    46402        0      28      654             0 gvfs-mtp-volume
[112950.724471] [ 7347] 10002  7347    49445        0      32      192             0 gvfs-gphoto2-vo
[112950.724474] [10959]     0 10959   788127        1      80      451             0 console-kit-dae
[112950.724478] [11032] 10002 11032    24574        4      51      928             0 bash
[112950.724482] [12531] 10002 12531    44623        1      24      127             0 dconf-service
[112950.724486] [20988] 10002 20988     3111        0      11       52             0 bash
[112950.724489] [20990] 10002 20990   374912    52161     582    81262             0 firefox
[112950.724494] [22613] 10002 22613   255430     1115     305    11258             0 plugin-containe
[112950.724498] [28281] 10002 28281    24729        0      51      318             0 ssh
[112950.724502] [29075] 10002 29075  8427463  3939146   15686  4003496             0 python3
[112950.724506] [31021] 10002 31021   132646     1958     154    12669             0 python2.7
[112950.724509] [31488] 10002 31488     1757      149       9        0             0 inotifywait
[112950.724513] [31489] 10002 31489     3131       47      11       25             0 syncdir.sh
[112950.724517] [31493] 10002 31493     1668       43       8        0             0 inotifywait
[112950.724521] [31494] 10002 31494     3132       47      11       27             0 syncdir.sh
[112950.724524] Out of memory: Kill process 29075 (python3) score 959 or sacrifice child
[112950.724529] Killed process 29075 (python3) total-vm:33709852kB, anon-rss:15756584kB, file-rss:0kB

It was using 32G memory which is max on this machine