Valgrind seems to crash when analyzing glusterfs client

claude-eric-steiner commented 3 years ago

Glusterfs client on freebsd is suffering a memory leak and gets OOM killed, even with most recent glusterfs 8.3 on most recent freebsd 12.2/13.

Statedump of glusterfs doesn't show anything helpfull, no over-sized objects are shown. To further analyze this issue, I wanted to start the glusterfs client with valgrind.

Starting glusterfs client with valgrind results in an error message from valgrind and then shuts down. (see logs below)

STEPS TO REPRODUCE

Install valgrind-devel-3.17.0.g20200723,1 from freebsd pkg manager/ports or compile source from freebsd_valgrind from this github (error exists in both, but seems to make sense, as I understand that valgrind-devel is just a copy from here)
Install glusterfs from pkg manager/ports (there is still the older 8.0, 8.3 can be compiled from ports by switching 8.0 with the newer 8.3)
Start glusterfs under valgrind (valgrind /usr/local/sbin/glusterfs --process-name fuse --no-daemon --volfile-server=gluster2 --volfile-id=/volume1 /mnt/glusterfs)

OBSERVED RESULT The following message is shown: valgrind: m_syswrap/syswrap-main.c:2172 (void vgPlain_clientsyscall(ThreadId, UInt)): Assertion 'VG(iseqsigset)(&tst->sig_mask, &tst->tmp_sig_mask)' failed. valgrind/glusterfs shuts down => filesystem is not mounted and the memory leak can not be tested.

EXPECTED RESULT Correctly start glusterfs under valgrind without crash should mount the glusterfs filesystem, filesystem load tests can be started, output of memory usage of glusterfs should be shown after unmounting the filesystem after the loadtest.

SOFTWARE/OS VERSIONS FreeBSD: FreeBSD Webserver5 12.2-RELEASE-p3 FreeBSD 12.2-RELEASE-p3 GENERIC amd64 GlusterFS: glusterfs-8.3 Valgrind: valgrind-devel-3.17.0.g20200723,1 or freebsd_valgrind (same happens with the forked valgrind here: https://github.com/paulfloyd/freebsd_valgrind)

ADDITIONAL INFORMATION

Full log of test run: root@Webserver5:~ # valgrind /usr/local/sbin/glusterfs --process-name fuse --no-daemon --volfile-server=gluster2 --volfile-id=/volume1 /mnt/glusterfs ==51275== Memcheck, a memory error detector ==51275== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==51275== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info ==51275== Command: /usr/local/sbin/glusterfs --process-name fuse --no-daemon --volfile-server=gluster2 --volfile-id=/volume1 /mnt/glusterfs ==51275== ==51602== ==51602== HEAP SUMMARY: ==51602== in use at exit: 668,125 bytes in 90 blocks ==51602== total heap usage: 130 allocs, 40 frees, 880,421 bytes allocated ==51602== ==52066== ==52066== HEAP SUMMARY: ==52066== in use at exit: 668,399 bytes in 98 blocks ==52066== total heap usage: 146 allocs, 48 frees, 881,543 bytes allocated ==52066== ==51602== LEAK SUMMARY: ==51602== definitely lost: 0 bytes in 0 blocks ==51602== indirectly lost: 0 bytes in 0 blocks ==51602== possibly lost: 620,589 bytes in 81 blocks ==51602== still reachable: 47,536 bytes in 9 blocks ==51602== suppressed: 0 bytes in 0 blocks ==51602== Rerun with --leak-check=full to see details of leaked memory ==51602== ==51602== For lists of detected and suppressed errors, rerun with: -s ==51602== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==52066== LEAK SUMMARY: ==52066== definitely lost: 224 bytes in 1 blocks ==52066== indirectly lost: 50 bytes in 7 blocks ==52066== possibly lost: 620,589 bytes in 81 blocks ==52066== still reachable: 47,536 bytes in 9 blocks ==52066== suppressed: 0 bytes in 0 blocks ==52066== Rerun with --leak-check=full to see details of leaked memory ==52066== ==52066== For lists of detected and suppressed errors, rerun with: -s ==52066== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

valgrind: m_syswrap/syswrap-main.c:2172 (void vgPlain_clientsyscall(ThreadId, UInt)): Assertion 'VG(iseqsigset)(&tst->sig_mask, &tst->tmp_sig_mask)' failed.

host stacktrace: ==51275== at 0x38106072: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==51275== by 0x40524CFDF: ??? ==51275== by 0x380FEBA8: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==51275== by 0x38106071: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd) ==51275== by 0x40524C83F: ??? ==51275== by 0x40200F34F: ???

sched status: running_tid=5

Thread 1: status = VgTs_WaitSys syscall 120 (lwpid 100290) ==51275== at 0x4EA5DAA: _readv (in /lib/libc.so.7) ==51275== by 0x4A4DB65: ??? (in /lib/libthr.so.3) ==51275== by 0x48D7068: sys_readv (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x6909972: ??? (in /usr/local/lib/glusterfs/8.3/rpc-transport/socket.so) ==51275== by 0x690931A: ??? (in /usr/local/lib/glusterfs/8.3/rpc-transport/socket.so) ==51275== by 0x690A6F5: ??? (in /usr/local/lib/glusterfs/8.3/rpc-transport/socket.so) ==51275== by 0x4909257: ??? (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x20EDDD: main (in /usr/local/sbin/glusterfsd) client stack range: [0x7FBFFC000 0x7FC000FFF] client SP: 0x7FC000288 valgrind stack range: [0x4029AE000 0x402AADFFF] top usage: 7424 of 1048576

Thread 2: status = VgTs_WaitSys syscall 454 (lwpid 100421) ==51275== at 0x4A5968C: ??? (in /lib/libthr.so.3) ==51275== by 0x4A4CEAF: ??? (in /lib/libthr.so.3) ==51275== by 0x4A56CAA: ??? (in /lib/libthr.so.3) ==51275== by 0x48BFC2D: ??? (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x4A4AFAB: ??? (in /lib/libthr.so.3) client stack range: [0x7FFFDFDFE000 0x7FFFDFFFDFFF] client SP: 0x7FFFDFFFDEC8 valgrind stack range: [0x404E35000 0x404F34FFF] top usage: 3224 of 1048576

Thread 3: status = VgTs_WaitSys syscall 429 (lwpid 100422) ==51275== at 0x4EA5C4A: _sigwait (in /lib/libc.so.7) ==51275== by 0x4A5097A: ??? (in /lib/libthr.so.3) ==51275== by 0x20D9EA: glusterfs_sigwaiter (in /usr/local/sbin/glusterfsd) ==51275== by 0x4A4AFAB: ??? (in /lib/libthr.so.3) client stack range: [0x7FFFDFBFD000 0x7FFFDFDFCFFF] client SP: 0x7FFFDFDFCF38 valgrind stack range: [0x404F39000 0x405038FFF] top usage: 2928 of 1048576

Thread 4: status = VgTs_WaitSys syscall 240 (lwpid 100423) ==51275== at 0x4EA5E2A: _nanosleep (in /lib/libc.so.7) ==51275== by 0x4A4D8FB: ??? (in /lib/libthr.so.3) ==51275== by 0x4E0E58A: sleep (in /lib/libc.so.7) ==51275== by 0x48D5277: ??? (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x4A4AFAB: ??? (in /lib/libthr.so.3) client stack range: [0x7FFFDF9FC000 0x7FFFDFBFBFFF] client SP: 0x7FFFDFBF9F08 valgrind stack range: [0x405041000 0x405140FFF] top usage: 3040 of 1048576

Thread 5: status = VgTs_Runnable syscall 232 (lwpid 100424) ==51275== at 0x4EC6C4A: _clock_gettime (in /lib/libc.so.7) ==51275== by 0x48DE8D8: timespec_now (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x6A9AB85: ??? (in /usr/local/lib/glusterfs/8.3/xlator/cluster/afr.so) ==51275== by 0x6ABA33A: ??? (in /usr/local/lib/glusterfs/8.3/xlator/cluster/afr.so) ==51275== by 0x48E7695: ??? (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x4E05448: ??? (in /lib/libc.so.7) client stack range: ??????? client SP: 0x773DED8 valgrind stack range: [0x40514D000 0x40524CFFF] top usage: 4312 of 1048576

Thread 6: status = VgTs_WaitSys syscall 454 (lwpid 100425) ==51275== at 0x4A5968C: ??? (in /lib/libthr.so.3) ==51275== by 0x4A4CEAF: ??? (in /lib/libthr.so.3) ==51275== by 0x4A56CAA: ??? (in /lib/libthr.so.3) ==51275== by 0x48E8337: ??? (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x48E8944: ??? (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x4A4AFAB: ??? (in /lib/libthr.so.3) client stack range: [0x7FFFDF5FA000 0x7FFFDF7F9FFF] client SP: 0x7FFFDF7F9EA8 valgrind stack range: [0x405259000 0x405358FFF] top usage: 3224 of 1048576

Thread 7: status = VgTs_WaitSys syscall 93 (lwpid 100426) ==51275== at 0x4F148DA: _select (in /lib/libc.so.7) ==51275== by 0x4A4DCB1: ??? (in /lib/libthr.so.3) ==51275== by 0x49247C2: runner (in /usr/local/lib/libglusterfs.so.0.0.1) ==51275== by 0x4A4AFAB: ??? (in /lib/libthr.so.3) client stack range: [0x7FFFDF3F9000 0x7FFFDF5F8FFF] client SP: 0x7FFFDF5F8F18 valgrind stack range: [0x405365000 0x405464FFF] top usage: 3224 of 1048576

Note: see also the FAQ in the source distribution. It contains workarounds to several common problems. In particular, if Valgrind aborted or crashed after identifying problems in your program, there's a good chance that fixing those problems will prevent Valgrind aborting or crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind version, and what OS and version you are using. Thanks.

claude-eric-steiner commented 3 years ago

I posted this problem also on the original valgrind bug report page, as I'm not sure where is the better place. https://bugs.kde.org/show_bug.cgi?id=433778

paulfloyd commented 3 years ago

Quick question: do you know what language glusterfs is written in?

claude-eric-steiner commented 3 years ago

@paulfloyd It's written primarly in C. SRC can be found here: https://github.com/gluster/glusterfs

paulfloyd commented 3 years ago

That's a good start. Looking at the output above, the error seems to be happening quite late, and VG has already printed its leak summary. If you run valgrind with --leak-check=full does it produce any useful output?

Also, can you try to comment out the assert on line 2172 of syswrap-main.c ? That should tell is if it is just a minor issue with the signal mask (I doubt it though).

On my side, I'll do some debugging to try to see which signals in the mask are different, and which signal is involved.

claude-eric-steiner commented 3 years ago

I'm not sure if this is indeed a good sign that we see the leak summary printed, as glusterfs client should continue to run (with mounted filesystem) until the unmount command is executed...

Let me check to comment out the assert on line 2172 and try again...

claude-eric-steiner commented 3 years ago

Indeed, commenting out line 2172 did the trick! (simple as that it should always be!) Thank you @paulfloyd for that hint!

But now I'm ending up with quite a big sized log file that I don't understand... But yeah it seams it finds some quite big leaks, question is now whom to bother with that stuff? glusterfs devs? But this leaks seams not to exist on linux... the guy who ported the glusterfs to freebsd?

I will try them all! And of course I will try to understand more what's written in that log!

valgrind_freebsd_glusterfs.log

paulfloyd commented 3 years ago

Good news, but there is still a problem in Valgrind that I'll need to investigate.

For your leaks

try to get a debug build (or debuginfo files)
start at the end where the biggest leaks are
'possible leaks' that really means 'probable leaks'. Valgrind found a pointer to the memory block but not to the start of the memory. This could be a memory manager where the size or some redzone marker is at the start of the memory and the user gets a pointer to just after that. Or it could be some random junk pointing to the memory. Unless you know that glusterfs uses a memory pool, assume that this is a genuine leak. If it is a pool, then it is probably harmless (but a bit annoying)
'still reachable' are also usually harmless / annoying, unless they grow to be really huge.

claude-eric-steiner commented 3 years ago

Thank you @paulfloyd I will give it a try with the debug symbols.

paulfloyd commented 3 years ago

On my machine I couldn't reproduce the problem. Can you run some tests to help diagnose the Valgrind problem?

claude-eric-steiner commented 3 years ago

Sure, what tests I shall do?

Also your knowhow of the low-level freebsd memory stuff could maybe help here: https://github.com/gluster/glusterfs/issues/2173

I was able to get the massif profiler to show that indeed the memory is increasing. But profilling the normal heap did not show the problem. Only --pages-as-heap=yes showed the increase...

paulfloyd commented 3 years ago

In reverse order. Your massif results are to be expected. If the glusterfs memory pool is based on mmap, then massif won't see that as the default is to only to track functions like C malloc and C++ new. --pages-as-heap makes it track mmap so then it can really see all memory that is really used.

If you can, the first test that I'd like is to see the end of the log output with --trace-symtab=yes. This will be very verbose and it should only be the very last syscall that is of interest.

Secondly, in m_libcsignal.c on line128 there is

         if (set1->sig[i] != set2->sig[i]) return False;

If you could change that to

         if (set1->sig[i] != set2->sig[i]) {
            VG_(printf)("DEBUG: %s i %d set1 %x set2 %xn", __func__, (int)i, (unsigned)(set1->sig[i]), (unsigned)(set2->sig[i]));
            return False;
       }

and send the output so that I can see which bits of the signal mask are different.

claude-eric-steiner commented 3 years ago

here is the log output with --trace-symtab=yes:

trace-symtab.log.tar.gz

The second part with recompiling valgrind with that change in m_libcsignal.c I will do tomorrow (need some sleep now :) )

claude-eric-steiner commented 3 years ago

and secondly with the changes in m_libcsignal.c on line 128:

trace-symtab-mod.log.tar.gz

If there is anything else I can do, let me know!

claude-eric-steiner commented 3 years ago

hmm seems this last log I provided did not contain any desired "DEBUG: xy" lines, what did I do wrong? make clean modified m_libcsignal.c make make install

paulfloyd commented 3 years ago

On 3 Mar 2021, at 07:41, claude-eric-steiner notifications@github.com wrote:

hmm seems this last log I provided did not contain any desired "DEBUG: xy" lines, what did I do wrong? make clean modified m_libcsignal.c make make install

I forgot to say you also need to uncomment the assert.

Also if you prefer you can skip the ‘make install’ stage and run Valgrind in the source directory with the vg-in-place script.

claude-eric-steiner commented 3 years ago

here we go: DEBUG: vgPlain_iseqsigset i 0 set1 62006001 set2 7ffef057n

full output:

==81497== Memcheck, a memory error detector
==81497== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==81497== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info
==81497== Command: /usr/local/sbin/glusterfs --process-name fuse --no-daemon --volfile-server=gluster2 --volfile-id=/volume1 /mnt/glusterfs
==81497==
==81845==
==81845== HEAP SUMMARY:
==81845==     in use at exit: 667,605 bytes in 88 blocks
==81845==   total heap usage: 128 allocs, 40 frees, 879,901 bytes allocated
==81845==
==82106==
==82106== HEAP SUMMARY:
==82106==     in use at exit: 667,879 bytes in 96 blocks
==82106==   total heap usage: 144 allocs, 48 frees, 881,023 bytes allocated
==82106==
==81845== LEAK SUMMARY:
==81845==    definitely lost: 0 bytes in 0 blocks
==81845==    indirectly lost: 0 bytes in 0 blocks
==81845==      possibly lost: 620,469 bytes in 81 blocks
==81845==    still reachable: 47,136 bytes in 7 blocks
==81845==         suppressed: 0 bytes in 0 blocks
==81845== Rerun with --leak-check=full to see details of leaked memory
==81845==
==81845== For lists of detected and suppressed errors, rerun with: -s
==81845== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==82106== LEAK SUMMARY:
==82106==    definitely lost: 224 bytes in 1 blocks
==82106==    indirectly lost: 50 bytes in 7 blocks
==82106==      possibly lost: 620,469 bytes in 81 blocks
==82106==    still reachable: 47,136 bytes in 7 blocks
==82106==         suppressed: 0 bytes in 0 blocks
==82106== Rerun with --leak-check=full to see details of leaked memory
==82106==
==82106== For lists of detected and suppressed errors, rerun with: -s
==82106== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
DEBUG: vgPlain_iseqsigset i 0 set1 62006001 set2 7ffef057n
valgrind: m_syswrap/syswrap-main.c:2172 (void vgPlain_client_syscall(ThreadId, UInt)): Assertion 'VG_(iseqsigset)(&tst->sig_mask, &tst->tmp_sig_mask)' failed.

host stacktrace:
==81497==    at 0x381060A2: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==81497==    by 0x40529EFDF: ???
==81497==    by 0x380FEBD8: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==81497==    by 0x381060A1: ??? (in /usr/local/lib/valgrind/memcheck-amd64-freebsd)
==81497==    by 0x40529E83F: ???
==81497==    by 0x40200D72F: ???

sched status:
  running_tid=4

Thread 1: status = VgTs_WaitSys syscall 120 (lwpid 100125)
==81497==    at 0x4F2CDAA: _readv (in /lib/libc.so.7)
==81497==    by 0x4AD4B65: ??? (in /lib/libthr.so.3)
==81497==    by 0x48FEC90: sys_readv (syscall.c:353)
==81497==    by 0x699FC55: __socket_ssl_readv (socket.c:568)
==81497==    by 0x699F9A6: __socket_ssl_read (socket.c:585)
==81497==    by 0x699F6BA: __socket_cached_read (socket.c:623)
==81497==    by 0x699EEF2: __socket_rwv (socket.c:734)
==81497==    by 0x69A4166: __socket_readv (socket.c:822)
==81497==    by 0x69A4409: __socket_read_frag (socket.c:2232)
==81497==    by 0x69A3DD5: __socket_proto_state_machine (socket.c:2398)
==81497==    by 0x69A36FC: socket_proto_state_machine (socket.c:2488)
==81497==    by 0x69A0F3D: socket_event_poll_in (socket.c:2528)
==81497==    by 0x69A09A8: socket_event_handler (socket.c:2934)
==81497==    by 0x494CEF2: event_dispatch_poll_handler (event-poll.c:367)
==81497==    by 0x494C821: event_dispatch_poll (event-poll.c:464)
==81497==    by 0x48FABDE: gf_event_dispatch (event.c:115)
==81497==    by 0x210ACA: main (glusterfsd.c:2738)
client stack range: [0x7FBFFB000 0x7FC000FFF] client SP: 0x7FBFFFF48
valgrind stack range: [0x4029AE000 0x402AADFFF] top usage: 7936 of 1048576

Thread 2: status = VgTs_WaitSys syscall 454 (lwpid 101039)
==81497==    at 0x4AE068C: ??? (in /lib/libthr.so.3)
==81497==    by 0x4AD3EAF: ??? (in /lib/libthr.so.3)
==81497==    by 0x4ADDCAA: ??? (in /lib/libthr.so.3)
==81497==    by 0x48D1FC3: gf_timer_proc (timer.c:141)
==81497==    by 0x4AD1FAB: ??? (in /lib/libthr.so.3)
client stack range: [0x7FFFDFDFE000 0x7FFFDFFFDFFF] client SP: 0x7FFFDFFFDEC8
valgrind stack range: [0x404F97000 0x405096FFF] top usage: 3224 of 1048576

Thread 3: status = VgTs_WaitSys syscall 429 (lwpid 101040)
==81497==    at 0x4F2CC4A: _sigwait (in /lib/libc.so.7)
==81497==    by 0x4AD797A: ??? (in /lib/libthr.so.3)
==81497==    by 0x20F3BC: glusterfs_sigwaiter (glusterfsd.c:2247)
==81497==    by 0x4AD1FAB: ??? (in /lib/libthr.so.3)
client stack range: [0x7FFFDFBFD000 0x7FFFDFDFCFFF] client SP: 0x7FFFDFDFCF18
valgrind stack range: [0x40509B000 0x40519AFFF] top usage: 2928 of 1048576

Thread 4: status = VgTs_Runnable syscall 232 (lwpid 101041)
==81497==    at 0x4F4DC4A: _clock_gettime (in /lib/libc.so.7)
==81497==    by 0x4909E47: timespec_now (timespec.c:31)
==81497==    by 0x6BAE4A5: copy_frame (stack.h:519)
==81497==    by 0x6BADF89: afr_copy_frame (afr-common.c:681)
==81497==    by 0x6BDCCEA: afr_lock_heal (afr-common.c:547)
==81497==    by 0x49170CA: synctask_wrap (syncop.c:335)
==81497==    by 0x4E8C448: ??? (in /lib/libc.so.7)
client stack range: ??????? client SP: 0x7912E18
valgrind stack range: [0x40519F000 0x40529EFFF] top usage: 4312 of 1048576

Thread 5: status = VgTs_WaitSys syscall 454 (lwpid 101042)
==81497==    at 0x4AE068C: ??? (in /lib/libthr.so.3)
==81497==    by 0x4AD3EAF: ??? (in /lib/libthr.so.3)
==81497==    by 0x4ADDCAA: ??? (in /lib/libthr.so.3)
==81497==    by 0x491823E: syncenv_task (syncop.c:577)
==81497==    by 0x4918ABC: syncenv_processor (syncop.c:678)
==81497==    by 0x4AD1FAB: ??? (in /lib/libthr.so.3)
client stack range: [0x7FFFDF7FB000 0x7FFFDF9FAFFF] client SP: 0x7FFFDF9FAEA8
valgrind stack range: [0x4052A3000 0x4053A2FFF] top usage: 3224 of 1048576

Thread 6: status = VgTs_WaitSys syscall  93 (lwpid 101043)
==81497==    at 0x4F9B8DA: _select (in /lib/libc.so.7)
==81497==    by 0x4AD4CB1: ??? (in /lib/libthr.so.3)
==81497==    by 0x4970314: runner (timer-wheel.c:186)
==81497==    by 0x4AD1FAB: ??? (in /lib/libthr.so.3)
client stack range: [0x7FFFDF5FA000 0x7FFFDF7F9FFF] client SP: 0x7FFFDF7F9F48
valgrind stack range: [0x4053A7000 0x4054A6FFF] top usage: 3224 of 1048576

Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

paulfloyd commented 3 years ago

Sorry, one more thing. I made a mistake in my previous post.

Could you do a run with the assert in place and with --trace-syscalls=yes and then post the log?

claude-eric-steiner commented 3 years ago

With ~~--trace-syscalls=yes~~ --trace-symtab=yes trace-symtab-mod.log(1).tar.gz

paulfloyd commented 3 years ago

Sorry, my mistake was to say symtab rather than syscalls.

claude-eric-steiner commented 3 years ago

Lucky you, shell history took --trace-symtab=yes, so the log you have should contain what you wanted. Full command executed: valgrind --trace-symtab=yes --log-file=trace-symtab-mod.log /usr/local/sbin/glusterfs --process-name fuse --no-daemon --volfile-server=gluster2 --volfile-id=/volume1 /mnt/glusterfs

paulfloyd commented 3 years ago

My brain is really not working at the moment. It's syscalls that I need.

claude-eric-steiner commented 3 years ago

we are now on the edge of you owning me a beer :yum:

trace-syscalls-mod.log.tar.gz

(this one is taken with glusterfs 8.4, as I changed from the earlier used 8.3 to 8.4, I hope this makes no difference for you)

paulfloyd commented 3 years ago

If you are ever in Grenoble ...

OK from the log

SYSCALL6453,5 sys_swapcontext ( 0x5ad9250, 0x5d60d90 ) --> [pre-success] NoWriteResult

swapcontext is a rather scary syscall that basically copies the saved registers into the current registers. It also copies the signal mask.

So perhaps the assert should only apply to non-swapcontext syscalls. (On Linux swapcontext is not a syscall except on PPC)

claude-eric-steiner commented 3 years ago

interesting... but I don't understand enough of these matters... let me know if you need some more logging

paulfloyd commented 3 years ago

I'll try to create a small testcase to validate my hypothesis.

paulfloyd commented 3 years ago

I've managed to reproduce the problem. I'm not that great at low level app programming so I just googled for an example, and modified that. Should now be fixed with

To github.com:paulfloyd/freebsd_valgrind.git 191f3714b..bf7dac3e1 freebsd -> freebsd

paulfloyd / freebsd_valgrind

Valgrind seems to crash when analyzing glusterfs client #155