naemon / naemon-core

Networks, Applications and Event Monitor
http://www.naemon.io/
GNU General Public License v2.0
153 stars 63 forks source link

Memory leak or heavy memory usage #200

Closed jframeau closed 6 years ago

jframeau commented 7 years ago

Centos 7 (7.3.1611) OMD 2.40 - naemon 1.0.6.

The facts:

naemon starts at 25 Mo and is 1 Go of RAM after 12h (and continue to grow unless a cold restart, reload doesn't help).

I found a first leak using valgrind:

In mod_gearman (neb_module/result_thread.c, line 229), there's a strdup which naemon doesn't release.

ifdef USENAEMON

chk_result->source = gm_strdup( value );

endif

So my proposition in naemon (head):

--- a/src/naemon/checks.c +++ b/src/naemon/checks.c @@ -524,6 +524,7 @@ int free_check_result(check_result *info) nm_free(info->host_name); nm_free(info->service_description); nm_free(info->output); nm_free(info->source);

return OK; }

Second, more obscur for me, using valgrind tool massif. After 30 mn, one code seems to use heavely memory:

->62.33% (64,728,793B) 0x4E9347E: nm_bufferqueue_push (bufferqueue.c:283) | ->59.13% (61,401,535B) 0x4E6CDA6: update_service_performance_data (perfdata.c:499) | | ->59.13% (61,401,535B) 0x4E45209: handle_async_service_check_result (checks_service.c:994) | | ->59.13% (61,401,535B) 0x82BE547: ??? | | ->59.13% (61,401,535B) 0x4E56FF3: execute_and_destroy_event.constprop.6 (events.c:249) | | ->59.13% (61,401,535B) 0x4E5757F: event_poll (events.c:367) | | ->59.13% (61,401,535B) 0x4032B3: main (in /opt/omd/versions/2.40-labs-edition/bin/naemon)

This is not a memory leak.

Could one explain why that code has so huge memory usage ?

jfr

jframeau commented 7 years ago

Full valgrind report using massif tool for information:

memory.txt

dirtyren commented 7 years ago

You could download the latest build to test if the memory leak is still so massive. This pull request was accepted that fixed a memory leak https://github.com/naemon/naemon-core/pull/191

I still have a memory leak happening in the latest build that I could not locate yet. Maybe with the patch your valgrid log will show less leaks, increasing our chances to find the leak.

[]s.

sni commented 7 years ago

I fixed the memory leak in mod-gearmans result handler: https://github.com/sni/mod_gearman/commit/6efd40ba0737e7c0a7feaf3d766fec836f30ef63

sni commented 7 years ago

thanks @jframeau for the pointer to update_service_performance_data. This functions leaks memory if there is no performancedata collector set. #210 should fix that one.

jframeau commented 7 years ago

Nice catch. Patch #210 + #191 applied. I'll run valgrind tonight with the same config. Let see that latest build.

jframeau commented 7 years ago

naemon patched with both #210 and #191 is far better. After many days of work, memory is fairly stable.

Just a last leak valgrind has raised (9 Mo / day ):

==3293== 8,989,218 bytes in 220,382 blocks are definitely lost in loss record 61 of 61 ==3293== at 0x4C29BE3: malloc (vg_replace_malloc.c:299) ==3293== by 0x5979949: strdup (in /usr/lib64/libc-2.17.so) ==3293== by 0xC90796B: gm_strdup (gm_alloc.c:38) ==3293== by 0xC908F03: get_results (result_thread.c:229) ==3293== by 0xCB6BAD7: FunctionV1::callback(gearman_job_st, void) (function_v1.hpp:66) ==3293== by 0xCB73CDA: gearman_worker_work (worker.cc:998) ==3293== by 0xC909551: result_worker (result_thread.c:76) ==3293== by 0x5F1FE24: start_thread (in /usr/lib64/libpthread-2.17.so) ==3293== by 0x59EB34C: clone (in /usr/lib64/libc-2.17.so)

sni commented 7 years ago

That one is fixed with: https://github.com/sni/mod_gearman/commit/6efd40ba0737e7c0a7feaf3d766fec836f30ef63

jframeau commented 7 years ago

hum, you're right, I didn't see it.

After recompiling mod-gearman with the former patch, no more leak (in my context).

All these patches around memory let me think that the issue could be closed.

Thx for the help.

topinet commented 6 years ago

Please, what the scheduled date for a release including this patches?

sni commented 6 years ago

Right now we are sorting things out how to improve packaging and releases. So there will be a new release soon.

riskersen commented 6 years ago

I would be also interested in the mentioned patches.

What does soon mean for you? ;-)

sni commented 6 years ago

Soon(tm) is tomorrow. It is always tomorrow :-) Just kidding, this time it's really tomorrow.

pvdputte commented 6 years ago

Moved from Icinga 1.6 to Naemon 1.0.6 last week and experienced severe memory leakage. (the sudden drops are scheduled cold restarts)

I upgraded to 1.0.7 last night. It's obviously better, but I'm afraid there are more leaks to be found. Do I open a new issue?

naemon-1 0 7

sni commented 6 years ago

Please raise a new issue and include some information about loaded plugins and the versions used. Some details about your setup would also help.