Open kseistrup opened 8 months ago
I run valgrind like this: G_DEBUG=gc-friendly G_SLICE=always-malloc valgrind --tool=memcheck --track-origins=yes --leak-check=full --leak-resolution=high --num-callers=30 --log-file=profval --show-reachable=yes --suppressions=prof.supp ./profanity
The output: profanity-valgrind-output.txt
Probably the most frequent one:
==23236== 1 bytes in 1 blocks are still reachable in loss record 1 of 3,077
==23236== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23236== by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23236== by 0x48A2B2: _create_entry (buffer.c:201)
==23236== by 0x489E95: _buffer_add (buffer.c:107)
==23236== by 0x489DDF: buffer_append (buffer.c:95)
==23236== by 0x468067: win_println_va_internal (window.c:1539)
==23236== by 0x46826F: win_println (window.c:1565)
==23236== by 0x47C421: _cons_splash_logo (console.c:2706)
==23236== by 0x475565: cons_about (console.c:388)
==23236== by 0x46A622: ui_init (core.c:108)
==23236== by 0x431BDD: _init (profanity.c:193)
==23236== by 0x431966: prof_run (profanity.c:98)
==23236== by 0x4EDAFC: main (main.c:174)
Should later get freed by buffer_free()
in which we call g_slist_free_full(buffer->entries, (GDestroyNotify)_free_entry);
.
The buffer/window/ui code is a bit hard to read/follow so I'm not yet 100% sure if we call the free function in all scenarios. And if not in one case we are assigning again. But lets not that is using glib data container function.
Maybe we have a mistake here so the more we "write" the more the needed memory will grow because somehow we don't free.
==23236== 3 bytes in 1 blocks are still reachable in loss record 64 of 3,077
==23236== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23236== by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23236== by 0x4C5C33: autocomplete_add (autocomplete.c:175)
==23236== by 0x4B3FD4: cmd_ac_init (cmd_ac.c:529)
==23236== by 0x4966E4: cmd_init (cmd_defs.c:2851)
==23236== by 0x431C3D: _init (profanity.c:200)
==23236== by 0x431966: prof_run (profanity.c:98)
==23236== by 0x4EDAFC: main (main.c:174)
In line 175 we again use strdup()
and add it to a glib container.
It is freed via:
When reading the surrounding code it looks to me like the free function will be called in appropriate places. Again its a glib container.
==23248== 9 bytes in 1 blocks are still reachable in loss record 729 of 3,099
==23248== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248== by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23248== by 0x5127295: ??? (in /usr/lib64/libtinfo.so.6.5)
==23248== by 0x512BC4F: _nc_tiparm (in /usr/lib64/libtinfo.so.6.5)
==23248== by 0x50F55A4: newterm_sp (in /usr/lib64/libncursesw.so.6.5)
==23248== by 0x50F60A5: newterm (in /usr/lib64/libncursesw.so.6.5)
==23248== by 0x50ED356: initscr (in /usr/lib64/libncursesw.so.6.5)
==23248== by 0x46A5BD: ui_init (core.c:95)
==23248== by 0x431BDD: _init (profanity.c:193)
==23248== by 0x431966: prof_run (profanity.c:98)
==23248== by 0x4EDAFC: main (main.c:174)
Which is confusing. We use the same setup/teardown for ncurses like https://gist.github.com/DesmondHayes/0c80af2654fc3ec602f9 for example.
I went on to get the ncurses tarball and see if they ship examples. They have a couple of tests and I ran valgrind on test/firstlast
which gave me leaks as well?
Still need to follow up on this post.
==23248== 9 bytes in 1 blocks are still reachable in loss record 735 of 3,099
==23248== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248== by 0x5929EA5: CRYPTO_strdup (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x589602C: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x5896143: CONF_module_add (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x58961A1: OPENSSL_load_builtin_modules (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x58961F8: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x52F9E7E: __pthread_once_slow (in /usr/lib64/libc.so.6)
==23248== by 0x59341B8: CRYPTO_THREAD_run_once (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x5896466: CONF_modules_load (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x5896BE9: CONF_modules_load_file_ex (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x592CAB3: ??? (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x52F9E7E: __pthread_once_slow (in /usr/lib64/libc.so.6)
==23248== by 0x59341B8: CRYPTO_THREAD_run_once (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x59301F5: OPENSSL_init_crypto (in /usr/lib64/glibc-hwcaps/x86-64-v3/libcrypto.so.3.1.4)
==23248== by 0x56C7340: OPENSSL_init_ssl (in /usr/lib64/glibc-hwcaps/x86-64-v3/libssl.so.3.1.4)
==23248== by 0x5159D4F: xmpp_initialize (in /usr/lib64/libstrophe.so.0.13.1)
==23248== by 0x439631: connection_init (connection.c:131)
==23248== by 0x438514: session_init (session.c:106)
==23248== by 0x431C38: _init (profanity.c:199)
==23248== by 0x431966: prof_run (profanity.c:98)
==23248== by 0x4EDAFC: main (main.c:174)
==23248== 16 bytes in 1 blocks are still reachable in loss record 1,075 of 3,099
==23248== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248== by 0x43958A: _xmalloc (connection.c:102)
==23248== by 0x515DBD9: xmpp_conn_new (in /usr/lib64/libstrophe.so.0.13.1)
==23248== by 0x43974E: connection_init (connection.c:153)
==23248== by 0x438514: session_init (session.c:106)
==23248== by 0x431C38: _init (profanity.c:199)
==23248== by 0x431966: prof_run (profanity.c:98)
==23248== by 0x4EDAFC: main (main.c:174)
Something for @sjaeckel ?
==23248== 18 bytes in 1 blocks are still reachable in loss record 1,092 of 3,099
==23248== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248== by 0x530C75D: strdup (in /usr/lib64/libc.so.6)
==23248== by 0x4D0C0E: theme_init (theme.c:84)
==23248== by 0x431BCC: _init (profanity.c:190)
==23248== by 0x431966: prof_run (profanity.c:98)
==23248== by 0x4EDAFC: main (main.c:174)
Again a glib container and we pass the relevant free functions to g_hash_table_new_full()
==23248== 228 bytes in 140 blocks are still reachable in loss record 2,918 of 3,099
==23248== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==23248== by 0x50BA488: xmalloc (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x5098534: rl_macro_bind (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50A8C3B: rl_parse_and_bind (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50A8DDC: ??? (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50A9099: ??? (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50A8A22: rl_parse_and_bind (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50A8DDC: ??? (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50A8F88: rl_read_init_file (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50A2CA9: rl_initialize (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x50B5D78: ??? (in /usr/lib64/libreadline.so.8.2)
==23248== by 0x472625: create_input_window (inputwin.c:163)
==23248== by 0x46A613: ui_init (core.c:105)
==23248== by 0x431BDD: _init (profanity.c:193)
==23248== by 0x431966: prof_run (profanity.c:98)
==23248== by 0x4EDAFC: main (main.c:174)
Again I can't see where we use readline the wrong way.
==26071== 10,136 bytes in 1 blocks are still reachable in loss record 765 of 779
==26071== at 0x4845794: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26071== by 0x43958A: _xmalloc (connection.c:102)
==26071== by 0x515AD01: ??? (in /usr/lib64/libstrophe.so.0.13.1)
==26071== by 0x515AE93: xmpp_send_raw_string (in /usr/lib64/libstrophe.so.0.13.1)
==26071== by 0x441F0E: iq_send_stanza (iq.c:2572)
==26071== by 0x4EBD20: omemo_bundle_publish (omemo.c:155)
==26071== by 0x4E5C07: omemo_publish_crypto_materials (omemo.c:339)
==26071== by 0x460407: sv_ev_connection_features_received (server_events.c:202)
==26071== by 0x43AA87: connection_features_received (connection.c:779)
==26071== by 0x4418C9: _disco_info_response_id_handler_onconnect (iq.c:2423)
==26071== by 0x43B9F1: _iq_handler (iq.c:241)
==26071== by 0x5163848: ??? (in /usr/lib64/libstrophe.so.0.13.1)
==26071== by 0x516A224: ??? (in /usr/lib64/libstrophe.so.0.13.1)
==26071== by 0x5E4FE43: ??? (in /usr/lib64/libxml2.so.2.12.8)
==26071== by 0x5E54927: xmlParseChunk (in /usr/lib64/libxml2.so.2.12.8)
==26071== by 0x5163450: xmpp_run_once (in /usr/lib64/libstrophe.so.0.13.1)
==26071== by 0x439797: connection_check_events (connection.c:162)
==26071== by 0x43894E: session_process_events (session.c:256)
==26071== by 0x4319FF: prof_run (profanity.c:128)
==26071== by 0x4EDAFC: main (main.c:174)
==26119== 32 bytes in 1 blocks are still reachable in loss record 2,541 of 3,077
==26119== at 0x484CA2F: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26119== by 0x4B2D1D5: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119== by 0x4B10DCE: ??? (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119== by 0x4B11C35: g_hash_table_new_full (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119== by 0x4E1A1C: p_gpg_init (gpg.c:121)
==26119== by 0x431C60: _init (profanity.c:209)
==26119== by 0x431966: prof_run (profanity.c:98)
==26119== by 0x4EDAFC: main (main.c:174)
Again glib container
==26119== 32 bytes in 1 blocks are still reachable in loss record 2,539 of 3,077
==26119== at 0x484CA2F: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26119== by 0x4B2D1D5: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119== by 0x4B10DCE: ??? (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119== by 0x4B11C35: g_hash_table_new_full (in /usr/lib64/libglib-2.0.so.0.8000.3)
==26119== by 0x4323BE: muc_init (muc.c:100)
==26119== by 0x431C51: _init (profanity.c:202)
==26119== by 0x431966: prof_run (profanity.c:98)
==26119== by 0x4EDAFC: main (main.c:174)
Again glib container with g_hash_table_new_full
.
That's enough for now. Lets try to fix those and then continue. What I'm wondering: do we really have so many mistakes in the code where we don't call the free functions? Or are we using the glib containers the wrong way and thus those free functions are never called? Does glib have memleaks (lets create a test program)?
TL;DR: Profanity ate 1 GB swap in less than 15 days
I have profanity running on a RPi4 (aarch64 architecture) that is running ArchLinuxARM. Profanity receives notifications from e.g. cron jobs and casual interpersonal chat, altogether less than a couple of hundred messages daily — some days substantially less.
The RPi4 has 4 GB RAM + 1 GB swap. Usually no swap is used at all, so at first I didn't notice when
free -tk
reported that used and free swap had swapped places, so to speak: 100% used swap and 0% free. This was when the RPi4 andprofanity
had been up for slightly less than 15 days.A
ps ux
revealed that theprofanity
process had gobbled up 2.5+ GB RAM, including all of the available 1 GB swap. Takingprofanity
down released all of the used swap, except ~40 MB.Expected Behavior
Ideally,
profanity
should not keep gobbling up RAM.Current Behavior
Launching
profanity
again, this is what I noticed (numbers taken fromps ux
):HH:MM
is wallclock hours and minutes.I tried the same thing on a x86_64 desktop pc, running ArchLinux:
Possible Solution
I haven't got a clue.
Steps to Reproduce (for bugs)
profanity
in a terminal emulatorps ux | rg pro[f]anity
to see its increasing memory consumptionContext
N/A
Environment
profanity -v
Options used when building the package can be seen here:
Cheers.