trueos / trueos-core

59 stars 18 forks source link

sysadm-client sometimes makes excessive use of memory, risk of bringing the OS to a halt #126

Closed grahamperrin closed 7 years ago

grahamperrin commented 7 years ago

This bugged me on various occasions and I'd like to make it reproducible for you but Murphy's Law, the hardware that I normally use for FreeNAS died a few days ago. On at least one occasion, all attempts to kill the client failed and so it was necessary to force off the computer (press and hold the power button).

I have a hunch that the client becomes hungry in some Life Preserver replication failure situations.

2016-10-18 21:41, hungry sysadm-client

2016-10-18 21-41 hungry sysadm-client

The client as it probably was before the hunger was observed

2016-10-18 21-44 the client as it probably was

2016-10-19 05:36, around ten minutes after boot

2016-10-19 05-36 around ten minutes after boot

2016-10-19 05:41, around fifteen minutes after boot

2016-10-19 05-41 around fifteen minutes after boot

2016-10-19 around 07:51

2016-10-19 07-51 sysadm-client

2016-10-22 14:48, nearly all swap space used

2016-10-22 14-49 nearly all swap space used

2016-10-22 around 14:50, processes

2016-10-22 14-50 processes

2016-10-22 around 14:50

2016-10-22 14-50 sysadm-client

2016-10-22 around 14:52, after using System Monitor to kill the client

AFAIR other approaches to killing, probably with htop, did not succeed.

2016-10-22 14-52 client would not close killed by system monitor

2016-10-22 around 14:57, a view of Life Preserver schedules

2016-10-22 14-57 life preserver schedules

grahamperrin commented 7 years ago

Above:

… the hardware that I normally use for FreeNAS died … a hunch that the client becomes hungry in some Life Preserver replication failure situations. …

After that failure of the replication target, I removed the one and only replication schedule.

The hogging persists, so it's probably safe to ignore my first hunch and instead, revisit the notion from #128:

… message presentation issue might somehow contribute to the memory-hogging aspect of #126, …

Yesterday (2016-11-05) at 01:01:

sysadm-client 3.7 GB memory after 01:00 whilst presenting outdated messages

no replication tasks hourly snapshots

15:16, 1.5 GB virtual memory used by the client:

sysadm-client 1.5 GB virtual memory

Today:

$ last | less
gjp4       pts/1    :0                     Sun  6 Nov 09:49   still logged in
gjp4       pts/0    :0                     Sat  5 Nov 15:33   still logged in
boot time                                  Sat  5 Nov 15:23
shutdown time                              Sat  5 Nov 15:21
gjp4       pts/0    :0                     Sat  5 Nov 15:21 - 15:21  (00:00)
gjp4       pts/1    :0                     Sat  5 Nov 15:02 - 15:17  (00:15)
gjp4       pts/1    :0                     Sat  5 Nov 14:58 - 15:01  (00:02)
gjp4       pts/0    :0                     Sat  5 Nov 14:44 - 15:17  (00:33)
gjp4       pts/0    :0                     Sat  5 Nov 14:06 - 14:43  (00:36)
boot time                                  Sat  5 Nov 14:03

– after leaving the machine overnight (I left the house around 19:00), this morning I arrived home and found the client using 11.7 G virtual memory, 9,278 M resident on a system with 16 GB RAM:

  1  [||||||||||||||||||||||||||||||||||||||||||||||||||||                                                                   41.9%]   Tasks: 136, 0 thr; 3 running
  2  [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                  81.2%]   Load average: 1.98 2.07 2.03 
  3  [|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                                          48.7%]   Uptime: 18:28:59
  4  [||||||||||||||||||||||||||||||||||||||||||||||||                                                                       38.2%]
  Mem[|||||||||||||||||||||||||||||||||||||||||||||||||                                                                5.03G/15.8G]
  Swp[|||||||||||||||||||||||||||||||||||||||||||||||                                                                  2.98G/8.00G]

  PID USER      PRI  NI  VIRT   RES S CPU% MEM%   TIME+  Command
 1764 gjp4       20   0 11.7G 9278M S 100. 57.5 18h10:16 /usr/local/bin/sysadm-client -session 1055961660e557000147834951100000036840025_1478359250_160353
 2706 gjp4       93   0 4084M 3065M R 79.2 19.0 17h01:16 /usr/local/bin/firefox
 1420 root      -21 -21  646M  173M S  5.0  1.1 15:39.94 X :0 -auth /var/tmp/.PCDMAuth-42a4f6ce2ac57a2858c1 -nolisten tcp
 1690 gjp4       21   0 1171M  125M S  2.1  0.8 41:04.88 kdeinit4: kdeinit4: plasma-desktop
 2685 gjp4       21   0  549M 36192 S  1.4  0.2 17:19.12 /usr/local/bin/gnome-system-monitor
 1775 gjp4       20   0  568M 32940 S  0.0  0.2  0:36.55 /usr/local/bin/pc-nettray
 3429 gjp4       20   0  505M 29136 S  0.5  0.2  1:14.59 kdeinit4: kdeinit4: konsole
 1686 gjp4       20   0  763M 25872 S  0.3  0.2  6:10.02 kwin -session 104a946447949a000146019633100000613330000_1478359265_220765
 1703 gjp4       20   0  282M 24396 S  0.0  0.1  0:06.86 /usr/local/libexec/mysqld --defaults-file=/usr/home/gjp4/.local/share/akonadi/mysql.conf --datadir=/usr/home/gjp4/.local/share/akonadi/db_data/ --socket=/tmp/akonadi-gjp4.UQBVGz/mysql.socket
 1740 gjp4       20   0  756M 14804 S  0.0  0.1  0:06.85 kdeinit4: kdeinit4: krunner
17618 gjp4       20   0  350M 13960 S  0.0  0.1  0:14.06 /usr/local/bin/nautilus --new-window
 1693 gjp4       20   0  542M 12560 S  0.0  0.1  0:01.02 /usr/local/bin/knotify4
 1759 gjp4       20   0  534M 11696 S  0.0  0.1  0:00.48 kget -session 1055961660e557000147262067000000024430033_1478359250_160320
 1701 gjp4       20   0  297M 11616 S  0.0  0.1  0:01.28 akonadiserver
 1809 gjp4       20   0  480M 11268 S  0.0  0.1  0:00.49 kdeinit4: kdeinit4: klipper
 1683 gjp4       20   0  482M 11024 S  0.0  0.1  0:02.00 kdeinit4: kdeinit4: ksmserver
 1757 gjp4       20   0  419M 10828 S  0.0  0.1  0:00.48 /usr/local/bin/kgpg -session 1055961660e557000147280896800000017310026_1478359250_160280
17269 gjp4       20   0  322M 10820 S  0.0  0.1  0:01.74 /usr/local/bin/ksnapshot -caption KSnapshot
 1674 gjp4       20   0  473M 10652 S  0.0  0.1  0:00.83 kdeinit4: kdeinit4: kglobalaccel
 1730 gjp4       20   0  581M 10464 S  0.0  0.1  0:01.30 /usr/local/bin/akonadi_sendlater_agent --identifier akonadi_sendlater_agent
 1726 gjp4       20   0  603M 10396 S  0.0  0.1  0:01.37 /usr/local/bin/akonadi_mailfilter_agent --identifier akonadi_mailfilter_agent
 1755 gjp4       52   0  712M 10372 S  0.0  0.1  0:00.63 kdeinit4: kdeinit4: kmix -session 104a946447949a000146150907000000027240026_1478354725_972308
 1744 gjp4       29   0  268M  9960 S  0.0  0.1  2:41.84 pc-mixer -session 102ca23629333a000143909649400000063880010_1478359250_159067
 1679 gjp4       20   0  351M  9916 S  0.0  0.1  0:01.15 /usr/local/bin/kactivitymanagerd
 1798 gjp4       20   0  470M  9860 S  0.0  0.1  0:00.68 /usr/local/bin/korgac --icon korgac
 1728 gjp4       20   0  422M  9776 S  0.0  0.1  0:01.17 /usr/local/bin/akonadi_newmailnotifier_agent --identifier akonadi_newmailnotifier_agent
 1729 gjp4       20   0  527M  9520 S  0.0  0.1  0:01.29 /usr/local/bin/akonadi_notes_agent --identifier akonadi_notes_agent
 1719 gjp4       20   0  603M  9492 S  0.0  0.1  0:01.44 /usr/local/bin/akonadi_archivemail_agent --identifier akonadi_archivemail_agent
 1794 gjp4       20   0  330M  9260 S  0.0  0.1  0:00.84 /usr/local/lib/kde4/libexec/polkit-kde-authentication-agent-1
 1720 gjp4       39  19  353M  9240 S  0.0  0.1  0:01.06 /usr/local/bin/akonadi_baloo_indexer --identifier akonadi_baloo_indexer
 1725 gjp4       20   0  379M  8724 S  0.0  0.1  0:01.06 /usr/local/bin/akonadi_maildispatcher_agent --identifier akonadi_maildispatcher_agent
 1727 gjp4       20   0  327M  8688 S  0.0  0.1  0:01.06 /usr/local/bin/akonadi_migration_agent --identifier akonadi_migration_agent
 1723 gjp4       20   0  267M  8636 S  0.0  0.1  0:00.40 /usr/local/bin/akonadi_agent_launcher akonadi_ical_resource akonadi_ical_resource_0
 1689 gjp4       39  19  334M  8620 S  0.0  0.1  0:00.65 /usr/local/bin/baloo_file
 1722 gjp4       20   0  334M  8568 S  0.0  0.1  0:01.05 /usr/local/bin/akonadi_followupreminder_agent --identifier akonadi_followupreminder_agent
 1442 root       52   0  565M  8568 S  0.0  0.1  0:01.27 /usr/local/bin/PCDM-session -once
 1718 gjp4       20   0  273M  8528 S  0.0  0.1  0:00.44 /usr/local/bin/akonadi_agent_launcher akonadi_akonotes_resource akonadi_akonotes_resource_0
 1198 root       20   0  141M  8500 S  0.0  0.1  5:09.31 /usr/local/bin/sysadm-binary
 1721 gjp4       20   0  339M  8480 S  0.0  0.1  0:01.03 /usr/local/bin/akonadi_birthdays_resource --identifier akonadi_birthdays_resource
 1667 gjp4       28   0  687M  8468 S  0.0  0.1  0:01.55 kdeinit4: kdeinit4: kded4
 1731 gjp4       20   0  262M  8460 S  0.0  0.1  0:00.40 /usr/local/bin/akonadi_agent_launcher akonadi_vcard_resource akonadi_vcard_resource_0
 1663 gjp4       20   0  375M  8440 S  0.0  0.1  0:00.24 kdeinit4: kdeinit4 Running...
 1724 gjp4       20   0  271M  8384 S  0.0  0.1  0:00.41 /usr/local/bin/akonadi_agent_launcher akonadi_maildir_resource akonadi_maildir_resource_0
 1754 gjp4       29   0  384M  8332 S  0.0  0.1  0:00.03 kdeinit4: kdeinit4: kmix -session 104a946447949a000146150907000000027240026_1478354725_972308
 1696 gjp4       20   0  304M  8296 S  0.0  0.1  0:00.56 /usr/local/bin/kuiserver
 1681 gjp4       20   0  323M  8200 S  0.0  0.0  0:00.27 /usr/local/lib/kde4/libexec/kdeconnectd
 1806 gjp4       36   0  384M  8120 S  0.0  0.0  0:00.02 kdeinit4: kdeinit4: kmix
    0 root      -16   0     0  8112 S  0.0  0.0  6:33.76 kernel
 1734 gjp4       31   0  448M  7780 S  0.0  0.0  0:00.08 kdeinit4: kdeinit4: kmixctrl --restore
 6265 gjp4       20   0 25620  6752 R  0.0  0.0  0:00.02 htop
 1665 gjp4       20   0  378M  6584 S  0.0  0.0  0:00.18 kdeinit4: kdeinit4: klauncher --fd=8
 1799 gjp4       39  19  171M  6376 S  0.0  0.0  0:03.38 /usr/local/libexec/tracker-miner-fs
17658 gjp4       20   0  199M  5256 S  0.0  0.0  0:00.08 /usr/local/bin/mate-screensaver --no-daemon
 1748 gjp4       20   0  268M  4212 S  0.0  0.0  0:29.05 /usr/local/bin/pulseaudio --start --log-target=syslog
 2705 gjp4       20   0 40476  3892 S  0.0  0.0  7:14.95 libgtop-server
 2722 gjp4       20   0 38008  3612 S  0.0  0.0  0:00.22 /usr/local/libexec/dconf-service
 1699 gjp4       20   0  102M  3612 S  0.0  0.0  0:01.41 /usr/local/bin/akonadi_control
 1800 gjp4       39  19  152M  3548 S  0.0  0.0  0:00.69 /usr/local/libexec/tracker-extract
 5864 gjp4       20   0 13956  3408 S  0.0  0.0  0:00.02 /usr/local/bin/bash
 2698 gjp4       20   0 64796  3368 S  0.0  0.0  0:00.43 /usr/local/libexec/at-spi2-registryd --use-gnome-session
 1803 gjp4       39  19  127M  3364 S  0.0  0.0  0:00.49 /usr/local/libexec/tracker-miner-apps
$ date ; uptime ; freebsd-version ; uname -a
Sun  6 Nov 2016 09:51:55 GMT
 9:51a.m.  up 18:29, 2 users, load averages: 1.98, 2.07, 2.02
12.0-CURRENT
FreeBSD momh167-gjp4-hpelitebook850g2-trueos.university.brighton.ac.uk 12.0-CURRENT FreeBSD 12.0-CURRENT #14 1717ae3(drm-next-4.7): Fri Oct 28 13:49:49 UTC 2016     root@gauntlet:/usr/obj/usr/src/sys/GENERIC  amd64
$ 

More than half an hour after a SIGTERM kill of the client, the OS still feels clunky (Firefox slow to respond to keystrokes, and so on) –

swap

– and I'll probably not get rid of the clunkiness without a restart of the OS, so this bug is quite disruptive.

grahamperrin commented 7 years ago

(This issue is probably ready for reassignment to a developer; read alongside https://github.com/trueos/trueos-core/issues/128#issuecomment-258767269 where the problem may be reproducible with multiple desktop environments.)

grahamperrin commented 7 years ago

Reproducibility

I started the OS and used:

Half-expecting leaps in usage of memory at hourly intervals, shortly after 21:00 I observed a leap.

Around twenty minutes later, no remarkable increase (below) but I half-expect leaps to continue. I'll review usage after 22:00 and/or maybe tomorrow morning.

$ date ; freebsd-version ; uname -a ; htop
Sat 12 Nov 2016 21:22:43 GMT
12.0-CURRENT
FreeBSD momh167-gjp4-hpelitebook850g2-trueos.university.brighton.ac.uk 12.0-CURRENT FreeBSD 12.0-CURRENT #14 1717ae3(drm-next-4.7): Fri Oct 28 13:49:49 UTC 2016     root@gauntlet:/usr/obj/usr/src/sys/GENERIC  amd64

  1  [                                                    0.0%]   Tasks: 92, 0 thr; 3 running
  2  [                                                    0.0%]   Load average: 0.44 0.55 0.60 
  3  [                                                    0.0%]   Uptime: 00:33:35
  4  [                                                    0.0%]
  Mem[||||||||||||||                               2.95G/15.8G]
  Swp[                                                0K/8.00G]

  PID USER      PRI  NI  VIRT   RES S CPU% MEM%   TIME+  Command
 2509 gjp4       83   0 2722M 2230M R 45.7 13.8 17:49.42 /usr/local/bin/firefox
 1716 gjp4       30  10 1494M  979M S  0.0  6.1  1:20.25 /usr/local/bin/sysadm-client
 1654 gjp4       20   0  789M  190M S  0.1  1.2  0:40.81 lumina-desktop
 1531 root      -21 -21  469M  118M S  0.7  0.7  0:29.23 X :0 -auth /var/tmp/.PCDMAuth-0ea04e26e8f074e8ee3e -nolisten tcp
 1695 gjp4       30  10  433M 53192 S  0.0  0.3  0:00.72 /usr/local/bin/python2.7 /usr/local/bin/hp-systray -x
 1840 gjp4       20   0  400M 49680 S  0.9  0.3  0:12.15 /usr/local/bin/konsole
$ 
grahamperrin commented 7 years ago

… half-expect leaps to continue. I'll review usage after 22:00 …

Lumina alone. One screenshot three seconds before 22:00, three shots soon afterwards.

21-59-57

22:00:41, a remarkable increase in use of memory:

22-00-41

22:02:32, usage still increasing and the September messages in November are exemplary of #128:

22-02-32

22:03:51, the surge of activity has ceased but the memory footprint of sysadm-client remains much larger than is desirable:

22-03-51

If I had left the computer on with sysadm-client running, then almost certainly I would have found the system effectively useless (swap space exhausted) some time on Sunday.

To avoid that exhaustion I logged out (ending my Lumina test session), logged in to KDE, closed the icon of the secondary SysAdm client (discussion), then used the one remaining SysAdm icon to close the client.


Side note: whilst the closure was effective – there's no longer a sysadm-client process – the icon persists in the System Tray of KDE. This seems to be a rare, negligible, cosmetic bug.

grahamperrin commented 7 years ago

On one hand: no recent incidents of this bug.

On the other hand: I have not recently used PersonaCrypt from a USB flash drive. So I'll not rush to close this issue.

pkgdemon commented 7 years ago

I have a personacrypt usb install. I have not been able to reproduce this issue for some time.