Closed jframeau closed 6 years ago
Full valgrind report using massif tool for information:
You could download the latest build to test if the memory leak is still so massive. This pull request was accepted that fixed a memory leak https://github.com/naemon/naemon-core/pull/191
I still have a memory leak happening in the latest build that I could not locate yet. Maybe with the patch your valgrid log will show less leaks, increasing our chances to find the leak.
[]s.
I fixed the memory leak in mod-gearmans result handler: https://github.com/sni/mod_gearman/commit/6efd40ba0737e7c0a7feaf3d766fec836f30ef63
thanks @jframeau for the pointer to update_service_performance_data. This functions leaks memory if there is no performancedata collector set. #210 should fix that one.
Nice catch. Patch #210 + #191 applied. I'll run valgrind tonight with the same config. Let see that latest build.
naemon patched with both #210 and #191 is far better. After many days of work, memory is fairly stable.
Just a last leak valgrind has raised (9 Mo / day ):
==3293== 8,989,218 bytes in 220,382 blocks are definitely lost in loss record 61 of 61 ==3293== at 0x4C29BE3: malloc (vg_replace_malloc.c:299) ==3293== by 0x5979949: strdup (in /usr/lib64/libc-2.17.so) ==3293== by 0xC90796B: gm_strdup (gm_alloc.c:38) ==3293== by 0xC908F03: get_results (result_thread.c:229) ==3293== by 0xCB6BAD7: FunctionV1::callback(gearman_job_st, void) (function_v1.hpp:66) ==3293== by 0xCB73CDA: gearman_worker_work (worker.cc:998) ==3293== by 0xC909551: result_worker (result_thread.c:76) ==3293== by 0x5F1FE24: start_thread (in /usr/lib64/libpthread-2.17.so) ==3293== by 0x59EB34C: clone (in /usr/lib64/libc-2.17.so)
That one is fixed with: https://github.com/sni/mod_gearman/commit/6efd40ba0737e7c0a7feaf3d766fec836f30ef63
hum, you're right, I didn't see it.
After recompiling mod-gearman with the former patch, no more leak (in my context).
All these patches around memory let me think that the issue could be closed.
Thx for the help.
Please, what the scheduled date for a release including this patches?
Right now we are sorting things out how to improve packaging and releases. So there will be a new release soon.
I would be also interested in the mentioned patches.
What does soon mean for you? ;-)
Soon(tm) is tomorrow. It is always tomorrow :-) Just kidding, this time it's really tomorrow.
Moved from Icinga 1.6 to Naemon 1.0.6 last week and experienced severe memory leakage. (the sudden drops are scheduled cold restarts)
I upgraded to 1.0.7 last night. It's obviously better, but I'm afraid there are more leaks to be found. Do I open a new issue?
Please raise a new issue and include some information about loaded plugins and the versions used. Some details about your setup would also help.
Centos 7 (7.3.1611) OMD 2.40 - naemon 1.0.6.
The facts:
naemon starts at 25 Mo and is 1 Go of RAM after 12h (and continue to grow unless a cold restart, reload doesn't help).
I found a first leak using valgrind:
In mod_gearman (neb_module/result_thread.c, line 229), there's a strdup which naemon doesn't release.
ifdef USENAEMON
chk_result->source = gm_strdup( value );
endif
So my proposition in naemon (head):
--- a/src/naemon/checks.c +++ b/src/naemon/checks.c @@ -524,6 +524,7 @@ int free_check_result(check_result *info) nm_free(info->host_name); nm_free(info->service_description); nm_free(info->output); nm_free(info->source);
return OK; }
Second, more obscur for me, using valgrind tool massif. After 30 mn, one code seems to use heavely memory:
->62.33% (64,728,793B) 0x4E9347E: nm_bufferqueue_push (bufferqueue.c:283) | ->59.13% (61,401,535B) 0x4E6CDA6: update_service_performance_data (perfdata.c:499) | | ->59.13% (61,401,535B) 0x4E45209: handle_async_service_check_result (checks_service.c:994) | | ->59.13% (61,401,535B) 0x82BE547: ??? | | ->59.13% (61,401,535B) 0x4E56FF3: execute_and_destroy_event.constprop.6 (events.c:249) | | ->59.13% (61,401,535B) 0x4E5757F: event_poll (events.c:367) | | ->59.13% (61,401,535B) 0x4032B3: main (in /opt/omd/versions/2.40-labs-edition/bin/naemon)
This is not a memory leak.
Could one explain why that code has so huge memory usage ?
jfr