naemon / naemon-core

Networks, Applications and Event Monitor
http://www.naemon.io/
GNU General Public License v2.0
153 stars 63 forks source link

Memory leaks on naemon 1.0.7 #244

Closed topinet closed 6 years ago

topinet commented 6 years ago

Following https://github.com/naemon/naemon-core/issues/200, after upgrading to Naemon 1.0.7, still there are memory leaks.

naemon_ram_used

In the graph, you can see ram usage before and after the upgrade, 3GB of RAM are being lost every 2 weeks, better than before at least.

sni commented 6 years ago

So what Neb-modules and versions are you using?

topinet commented 6 years ago

Debian 8 (amd64) naemon-livestatus 1.0.7 mod-gearman-module 3.0.5

sni commented 6 years ago

i see. Just noticed, that the mentioned fix for mod-gearman didn't make it into a release yet. Could you try the daily mod-gearman package by any chance?

topinet commented 6 years ago

mod-gearman-module_3.0.6.20180505_debian8_amd64.deb from testing repository is ok?

I also need to update mod-gearman-workers or it's compatible with workers on 3.0.5?

sni commented 6 years ago

Yes, and it should be sufficient to replace the new neb module only.

topinet commented 6 years ago

Done, I'll give you feedback about ram usage in a week.

topinet commented 6 years ago

After a week, RAM usage is far better than before, but RAM continues increasing slowly: naemon_ram_swap_used

Could there still be some memory leaks?

sni commented 6 years ago

Thanks for coming back to this. Could you by any chance have a look at vagrinds massif tool to see where most of the memory is allocated from?

topinet commented 6 years ago

At the moment, memory usage seems to be stable, using 1.2GB with 700 hosts and 4300+ services naemon_ram_swap_used2

jframeau commented 6 years ago

hmm, same trend here with naemon and mod_gearman from master branch.

A slow but endless memory grow up (it is about 4 Ko / 10 sec).

Using valgrind's massif tool, i get a weird result. The culprit seems around this part of code:

->15.01% (477,080B) 0x5D6E808: strdup (in /usr/lib64/libc-2.17.so) ->14.67% (466,345B) 0x4E61A67: nm_strdup (nm_alloc.c:42) ->07.41% (235,475B) 0x4E8F08A: xrddefault_read_state_information (xrddefault.c:1288) ->07.41% (235,475B) 0x4E72112: read_initial_state_information (sretention.c:106) ->07.41% (235,475B) 0x403240: main (naemon.c:635)
->02.85% (90,660B) 0x4E8F034: xrddefault_read_state_information (xrddefault.c:1285)
->02.85% (90,660B) 0x4E72112: read_initial_state_information (sretention.c:106)
->02.85% (90,660B) 0x403240: main (naemon.c:635)
->01.26% (40,192B) 0x4E8E6FC: xrddefault_read_state_information (xrddefault.c:1282)
->01.26% (40,192B) 0x4E72112: read_initial_state_information (sretention.c:106)
->01.26% (40,192B) 0x403240: main (naemon.c:635)
->03.15% (100,018B) in 9+ places, all below ms_print's threshold (01.00%)
->00.34% (10,735B) in 3+ places, all below ms_print's threshold (01.00%)

Any idea ?

topinet commented 6 years ago

seleccio_019

Definitely, memory consumption is stable with gearman-module 3.0.6.

sni commented 6 years ago

great news, thanks for the heads up.

pvdputte commented 6 years ago

Perfect, I wanted to open a long-overdue issue on my remaining memory consumption problem, and someone else has done exactly that already :-)

Running about 7000 hosts/94000 services on debian 9 stretch.

# dpkg -l | grep -e gearm -e naemon
ii  gearman-job-server                   0.33-6                         amd64        Job server for the Gearman distributed job queue
ii  gearman-tools                        0.33-6                         amd64        Tools for the Gearman distributed job queue
ii  libgearman7                          0.33-6                         amd64        Library providing Gearman client and worker functions
ii  libnaemon                            1.0.7                          amd64        Library for Naemon - common data files
ii  mod-gearman-module                   3.0.5                          amd64        Event broker module to distribute service checks.
ii  mod-gearman-tools                    3.0.5                          amd64        Tools for mod-gearman
ii  naemon-core                          1.0.7                          amd64        host/service/network monitoring and management system
ii  naemon-livestatus                    1.0.7                          amd64        contains the Naemon livestatus eventbroker module

Been restarting naemon every day since June. Looks fine now after deploying the 3.0.6 mod_gearman_naemon.o from https://labs.consol.de/repo/testing/debian/dists/stretch/main/binary-amd64/mod-gearman-module_3.0.6.20180505_debian9_amd64.deb as suggested above.

naemon-gearman-mem-fix