sustrik / libdill

Structured concurrency in C
MIT License
1.68k stars 155 forks source link

libdill-2.10 tests/threads2 fail #159

Closed progamer71 closed 6 years ago

progamer71 commented 6 years ago
====================================
   libdill 2.10: ./test-suite.log
====================================

# TOTAL: 23
# PASS:  22
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: tests/threads2
====================

FAIL tests/threads2 (exit status: 139)
$ cc -v
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
$ uname -a
Darwin MacBook-Pro.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64

my system: MacBook Pro late-2013 with macOS 10.13.6, latest updated.

build with: ./configure make make check

sustrik commented 6 years ago

Hi! I see this happening a lot in CI tests, but the log of the test, for unknown reason, is always empty. It only happens on OSX and I don't have a Mac. Could you possibly run the test under debugger and paste the failing backtrace?

Btw, It's non deterministic, so please try multiple times.

progamer71 commented 6 years ago

Hi, (I have another system running freebsd 11.2 amd64 with all latest updated, it does not have this problem.)

This is my first time using lldb debugger, please suggest me. I make it error 2 times. The interesting part is

(lldb) p self
(dill_slist *) $0 = 0x00000001003004a8
(lldb) p self->next
(dill_slist *) $1 = 0x002c0001003004a8
(lldb) p self->next->next
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory

The "self->next->next" point to invalid address. I suspect some function write value "0x002c" to self->next, so the error happen.

more detail

./configure --enable-debug
$ make
$ make install
$ cd tests
$ clang -Wall -g -ldill -o threads2 threads2.c
$ lldb threads2
(lldb) target create "threads2"
Current executable set to 'threads2' (x86_64).
(lldb) r
Process 28586 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 28586 exited with status = 0 (0x00000000)
(lldb) r
Process 28592 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 28592 exited with status = 0 (0x00000000)
(lldb) r
Process 28595 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 28595 exited with status = 0 (0x00000000)
(lldb) r
Process 28598 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 28598 exited with status = 0 (0x00000000)
(lldb) r
Process 28601 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 28601 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000a9ca7 libdill.21.dylib`dill_slist_pop(self=0x00000001003004a8) at slist.h:62
   59   /* Pop an item from the beginning of the list. */
   60   static inline struct dill_slist *dill_slist_pop(struct dill_slist *self) {
   61       struct dill_slist *item = self->next;
-> 62       self->next = item->next;
   63       return item;
   64   }
   65
Target 0: (threads2) stopped.
(lldb) p self
(dill_slist *) $0 = 0x00000001003004a8
(lldb) p self->next
(dill_slist *) $1 = 0x002c0001003004a8
(lldb) p self->next->next
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 28601 exited with status = 9 (0x00000009)
Process 29160 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 29160 exited with status = 0 (0x00000000)
(lldb) r
Process 29163 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 29163 exited with status = 0 (0x00000000)
(lldb) r
Process 29166 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 29166 exited with status = 0 (0x00000000)
(lldb) r
Process 29169 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 29169 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000a9ca7 libdill.21.dylib`dill_slist_pop(self=0x00000001003004a8) at slist.h:62
   59   /* Pop an item from the beginning of the list. */
   60   static inline struct dill_slist *dill_slist_pop(struct dill_slist *self) {
   61       struct dill_slist *item = self->next;
-> 62       self->next = item->next;
   63       return item;
   64   }
   65
Target 0: (threads2) stopped.
(lldb) p self
(dill_slist *) $3 = 0x00000001003004a8
(lldb) p self->next
(dill_slist *) $4 = 0x002c0001003004a8
(lldb) p self->next->next
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb)
sustrik commented 6 years ago

Type "bt" in the debugger to get the backtrace.

progamer71 commented 6 years ago
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001000a9ca7 libdill.21.dylib`dill_slist_pop(self=0x00000001003004a8) at slist.h:62
    frame #1: 0x00000001000a9c4c libdill.21.dylib`dill_ctx_fd_term(ctx=0x00000001003004a0) at fd.c:52
    frame #2: 0x00000001000a89f4 libdill.21.dylib`dill_ctx_term(ptr=0x00000001003002f0) at ctx.c:100
    frame #3: 0x00007fff62a2e163 libsystem_pthread.dylib`_pthread_tsd_cleanup + 463
    frame #4: 0x00007fff62a2dee9 libsystem_pthread.dylib`_pthread_exit + 79
    frame #5: 0x00007fff62a2e945 libsystem_pthread.dylib`pthread_exit + 30
    frame #6: 0x0000000100000f09 threads2`main at threads2.c:34
    frame #7: 0x00007fff62714015 libdyld.dylib`start + 1
    frame #8: 0x00007fff62714015 libdyld.dylib`start + 1
progamer71 commented 6 years ago

Hello, I have more information: I set break points at "dill_slist_init", "dill_slist_push", and "dill_slist_pop".

I found that after the 2nd time call dill_slist_init(self=0x0000000100300438) which mean coroutine is resumed? it will fail in the next call dill_slist_pop(self=0x00000001003004a8) every time.

Hope it will help. Thanks

$ lldb threads2
(lldb) target create "threads2"
Current executable set to 'threads2' (x86_64).
(lldb)  b dill_slist_init
Breakpoint 1: 3 locations.
(lldb)  b dill_slist_push
Breakpoint 2: 3 locations.
(lldb)  b dill_slist_pop
Breakpoint 3: 2 locations.

(lldb) r
Process 39140 launched: '/Users/progamer/Downloads/libdill-2.10/tests/threads2' (x86_64)
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000a3f48 libdill.21.dylib`dill_slist_init(self=0x0000000100300318) at slist.h:39
   36
   37   /* Initialize the list. */
   38   static inline void dill_slist_init(struct dill_slist *self) {
-> 39       self->next = self;
   40   }
   41
   42   /* True if the list has no items. */
Target 0: (threads2) stopped.

(lldb) c
Process 39140 resuming
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000a3f48 libdill.21.dylib`dill_slist_init(self=0x0000000100300438) at slist.h:39
   36
   37   /* Initialize the list. */
   38   static inline void dill_slist_init(struct dill_slist *self) {
-> 39       self->next = self;
   40   }
   41
   42   /* True if the list has no items. */
Target 0: (threads2) stopped.

(lldb) c
Process 39140 resuming
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
    frame #0: 0x00000001000a7e58 libdill.21.dylib`dill_slist_init(self=0x0000000100300480) at slist.h:39
   36
   37   /* Initialize the list. */
   38   static inline void dill_slist_init(struct dill_slist *self) {
-> 39       self->next = self;
   40   }
   41
   42   /* True if the list has no items. */
Target 0: (threads2) stopped.

(lldb) c
Process 39140 resuming
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3
    frame #0: 0x00000001000a9c18 libdill.21.dylib`dill_slist_init(self=0x00000001003004a8) at slist.h:39
   36
   37   /* Initialize the list. */
   38   static inline void dill_slist_init(struct dill_slist *self) {
-> 39       self->next = self;
   40   }
   41
   42   /* True if the list has no items. */
Target 0: (threads2) stopped.

(lldb) c
Process 39140 resuming
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x00000001000a4abc libdill.21.dylib`dill_slist_push(self=0x0000000100300438, item=0x00007ffeefbfede8) at slist.h:55
   52   /* Push the item to the beginning of the list. */
   53   static inline void dill_slist_push(struct dill_slist *self,
   54         struct dill_slist *item) {
-> 55       item->next = self->next;
   56       self->next = item;
   57   }
   58
Target 0: (threads2) stopped.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
  * frame #0: 0x00000001000a4abc libdill.21.dylib`dill_slist_push(self=0x0000000100300438, item=0x00007ffeefbfede8) at slist.h:55
    frame #1: 0x00000001000a3942 libdill.21.dylib`dill_waitfor(cl=0x00007ffeefbfede0, id=1, cancel=(libdill.21.dylib`dill_timer_cancel at cr.c:216)) at cr.c:430
    frame #2: 0x00000001000a3a06 libdill.21.dylib`dill_timer(tmcl=0x00007ffeefbfede0, id=1, deadline=41459345) at cr.c:231
    frame #3: 0x00000001000a56c2 libdill.21.dylib`dill_msleep(deadline=41459345) at libdill.c:38
    frame #4: 0x0000000100000e8c threads2`main at threads2.c:31
    frame #5: 0x00007fff62714015 libdyld.dylib`start + 1
    frame #6: 0x00007fff62714015 libdyld.dylib`start + 1

(lldb) c
Process 39140 resuming
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000a3f48 libdill.21.dylib`dill_slist_init(self=0x0000000100300438) at slist.h:39
   36
   37   /* Initialize the list. */
   38   static inline void dill_slist_init(struct dill_slist *self) {
-> 39       self->next = self;
   40   }
   41
   42   /* True if the list has no items. */
Target 0: (threads2) stopped.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00000001000a3f48 libdill.21.dylib`dill_slist_init(self=0x0000000100300438) at slist.h:39
    frame #1: 0x00000001000a3acb libdill.21.dylib`dill_wait at cr.c:440
    frame #2: 0x00000001000a56c7 libdill.21.dylib`dill_msleep(deadline=41459345) at libdill.c:39
    frame #3: 0x0000000100000e8c threads2`main at threads2.c:31
    frame #4: 0x00007fff62714015 libdyld.dylib`start + 1
    frame #5: 0x00007fff62714015 libdyld.dylib`start + 1

(lldb) c
Process 39140 resuming
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.2
    frame #0: 0x00000001000a9c98 libdill.21.dylib`dill_slist_pop(self=0x00000001003004a8) at slist.h:61
   58
   59   /* Pop an item from the beginning of the list. */
   60   static inline struct dill_slist *dill_slist_pop(struct dill_slist *self) {
-> 61       struct dill_slist *item = self->next;
   62       self->next = item->next;
   63       return item;
   64   }
Target 0: (threads2) stopped.

(lldb) c
Process 39140 resuming
Process 39140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000a9ca7 libdill.21.dylib`dill_slist_pop(self=0x00000001003004a8) at slist.h:62
   59   /* Pop an item from the beginning of the list. */
   60   static inline struct dill_slist *dill_slist_pop(struct dill_slist *self) {
   61       struct dill_slist *item = self->next;
-> 62       self->next = item->next;
   63       return item;
   64   }
   65
Target 0: (threads2) stopped.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001000a9ca7 libdill.21.dylib`dill_slist_pop(self=0x00000001003004a8) at slist.h:62
    frame #1: 0x00000001000a9c4c libdill.21.dylib`dill_ctx_fd_term(ctx=0x00000001003004a0) at fd.c:52
    frame #2: 0x00000001000a89f4 libdill.21.dylib`dill_ctx_term(ptr=0x00000001003002f0) at ctx.c:100
    frame #3: 0x00007fff62a2e163 libsystem_pthread.dylib`_pthread_tsd_cleanup + 463
    frame #4: 0x00007fff62a2dee9 libsystem_pthread.dylib`_pthread_exit + 79
    frame #5: 0x00007fff62a2e945 libsystem_pthread.dylib`pthread_exit + 30
    frame #6: 0x0000000100000f09 threads2`main at threads2.c:34
    frame #7: 0x00007fff62714015 libdyld.dylib`start + 1
    frame #8: 0x00007fff62714015 libdyld.dylib`start + 1
sustrik commented 6 years ago

I suspect there's some memory overwriting happening:

(lldb) p self (dill_slist ) $0 = 0x00000001003004a8 (lldb) p self->next (dill_slist ) $1 = 0x002c0001003004a8

That looks like the second point is actually the same as the first one, except that one byte is overwritten.

progamer71 commented 6 years ago

I think so. The fault often happen with "self" value ending with 8.

When I monitor the "self" value ending with 8 such as self=0x00000001002004a8 for this trace.

It go down in to malloc system at function "tiny_free_list_add_ptr" who make the change from 0x00000001002004a8 to 0x002c0001002004a8.

...
Process 42534 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000a9c23 libdill.21.dylib`dill_slist_init(self=0x00000001002004a8) at slist.h:40
   37   /* Initialize the list. */
   38   static inline void dill_slist_init(struct dill_slist *self) {
   39       self->next = self;
-> 40   }
   41
   42   /* True if the list has no items. */
   43   static inline int dill_slist_empty(struct dill_slist *self) {
Target 0: (threads2) stopped.
(lldb) watchpoint set expression -w write -- self
Watchpoint created: Watchpoint 2: addr = 0x1002004a8 size = 8 state = enabled type = w
    new value: 0x00000001002004a8

....

Watchpoint 2 hit:
old value: 0x00000001002004a8
new value: 0x002c0001002004a8
Process 42534 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 2
    frame #0: 0x00007fff628bee8d libsystem_malloc.dylib`tiny_free_list_add_ptr + 114
libsystem_malloc.dylib`tiny_free_list_add_ptr:
->  0x7fff628bee8d <+114>: movw   %r11w, 0x10(%rdx)
    0x7fff628bee92 <+119>: jmp    0x7fff628beea0            ; <+133>
    0x7fff628bee94 <+121>: testw  %r11w, %r11w
    0x7fff628bee98 <+125>: jne    0x7fff628beea0            ; <+133>
Target 0: (threads2) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 2
  * frame #0: 0x00007fff628bee8d libsystem_malloc.dylib`tiny_free_list_add_ptr + 114
    frame #1: 0x00007fff628d475a libsystem_malloc.dylib`tiny_free_no_lock + 570
    frame #2: 0x00007fff628d5256 libsystem_malloc.dylib`free_tiny + 628
    frame #3: 0x00007fff62a2e163 libsystem_pthread.dylib`_pthread_tsd_cleanup + 463
    frame #4: 0x00007fff62a2dee9 libsystem_pthread.dylib`_pthread_exit + 79
    frame #5: 0x00007fff62a2e945 libsystem_pthread.dylib`pthread_exit + 30
    frame #6: 0x0000000100000f09 threads2`main at threads2.c:34
    frame #7: 0x00007fff62714015 libdyld.dylib`start + 1
    frame #8: 0x00007fff62714015 libdyld.dylib`start + 1
(lldb)
sustrik commented 6 years ago

Interesting. It looks like it's caused by ordering of stuff inside of pthread_exit(). Can it be that thread-local storage is deallocated before destructor specified in pthread_key_create is called?

Can you somehow check where the block being deallocated in said free_tiny() was allocated in the first place?

sustrik commented 6 years ago

Another interesting observation: The failure happens when trying to deallocate socket buffers. However, the test program doesn't create a single socket, so why is there any buffer at all?

If you want to have a look, socket buffers are allocated at fd.c:65

progamer71 commented 6 years ago

Another interesting observation: The failure happens when trying to deallocate socket buffers. However, the test program doesn't create a single socket, so why is there any buffer at all?

If you want to have a look, socket buffers are allocated at fd.c:65

The execution path never call dill_fd_allocbuf() function, so no fd memory allocation here.

The execution steps relate to fd are: 1 initial ctx with ctx->count = 0 and empty ctx->cache (ctx->cache == ctx->cache.next)

   44   int dill_ctx_fd_init(struct dill_ctx_fd *ctx) {
-> 45       ctx->count = 0;
   46       dill_slist_init(&ctx->cache);
   47       return 0;
   48   }

2 execute other part of program, msleep() and pthread_exit().

3 if pthread_exit() have side effect to unintentionally modify ctx->cache->next. for example: (lldb) p self (dill_slist ) $0 = 0x00000001003004a8 (lldb) p self->next (dill_slist ) $1 = 0x002c0001003004a8

So the code "it == &ctx->cache" will be false and program will free the unallocated memory address at "it" which lead to "Segmentation fault: 11" program termination.

   49
   50   void dill_ctx_fd_term(struct dill_ctx_fd *ctx) {
   51       while(1) {
-> 52           struct dill_slist *it = dill_slist_pop(&ctx->cache);
   53           if(it == &ctx->cache) break;
   54           free(it);
   55       }

I try manually fix the corrupted memory in dill_ctx_fd_term() function. It can reach normal exit() successfully.

    frame #0: 0x00000001000a9c3c libdill.21.dylib`dill_ctx_fd_term(ctx=0x00000001002004a0) at fd.c:52
   49
   50   void dill_ctx_fd_term(struct dill_ctx_fd *ctx) {
   51       while(1) {
-> 52           struct dill_slist *it = dill_slist_pop(&ctx->cache);
   53           if(it == &ctx->cache) break;
   54           free(it);
   55       }
Target 0: (threads2) stopped.

(lldb) p ctx
(dill_ctx_fd *) $19 = 0x00000001002004a0

(lldb) p *ctx
(dill_ctx_fd) $20 = {
  count = 0  /* <-- mean empty cache */
  cache = {
    next = 0x002c0001002004a8 /* <-- corrupted memory address */
  }
}

(lldb) p ctx->cache.next
(dill_slist *) $22 = 0x002c0001002004a8 /* <-- corrupted memory address */

(lldb) expression ctx->cache.next = (dill_slist *)0x00000001002004a8 /* fixed, it should be empty slist */
(dill_slist *) $23 = 0x00000001002004a8

(lldb) p *ctx
(dill_ctx_fd) $24 = {
  count = 0 /* <-- mean empty cache */
  cache = {
    next = 0x00000001002004a8 /* <-- fixed memory address */
  }
}

(lldb) p ctx->cache.next
(dill_slist *) $25 = 0x00000001002004a8 /* <-- fixed memory address */

(lldb) c
Process 63496 resuming
Process 63496 exited with status = 0 (0x00000000)

The summary at this moment: pthread_exit() have side effect to unintentionally modify ctx->cache.next, so dill_ctx_fd_term() will free the corrupted memory address which lead to this fault.

progamer71 commented 6 years ago

I got an idea to solve this problem while i'm driving back to my home. :-) Just check ctx->count before call dill_slist_pop().

in fd.c file

...
#include <assert.h>
...
void dill_ctx_fd_term(struct dill_ctx_fd *ctx) {
    struct dill_slist *it;

    while(ctx->count > 0) {
        it = dill_slist_pop(&ctx->cache);
        assert(it != &ctx->cache);
        ctx->count--;
        free(it);
    }
/*
    while(1) {
        struct dill_slist *it = dill_slist_pop(&ctx->cache);
        if(it == &ctx->cache) break;
        free(it);
    }
*/
}
============================================================================
Testsuite summary for libdill 2.10
============================================================================
# TOTAL: 23
# PASS:  23
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

run threads2 for 1,000 times without problem. for i in {1..1000}; do ./threads2; done

Note:

sustrik commented 6 years ago

Sounds like a reasonable workaround. As long as your application is single-threaded you shouldn't see any memory leaks.

sustrik commented 6 years ago

I've been looking at the code and it's not clear what's happening. What exactly is _pthread_tsd_cleanup trying to deallocate when it corrupts the memory?

The memory that's being corrupeted is allocated at ctx.c:176 and is a perfectly normal malloc(). Moreover the corrupted byte is at the very end of the allocated block.

Maybe the heap is corrupted even before that? Can you configure with --enable_debug --enable_valgrind and then run the test under valgrind?

progamer71 commented 6 years ago

I do not apply my workaround. This is the result:

$ valgrind ./threads2
==52447== Memcheck, a memory error detector
==52447== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==52447== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==52447== Command: ./threads2
==52447==
==52447== Invalid read of size 8
==52447==    at 0x1000C2C9C: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C2C4B: dill_ctx_fd_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C19F3: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ee8 is 456 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid read of size 8
==52447==    at 0x1000C2CA7: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C2C4B: dill_ctx_fd_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C19F3: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ee8 is 456 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid write of size 8
==52447==    at 0x1000C2CAE: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C2C4B: dill_ctx_fd_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C19F3: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ee8 is 456 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid read of size 4
==52447==    at 0x1000BECE0: dill_ctx_pollset_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C1A03: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ec8 is 424 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid read of size 8
==52447==    at 0x1000BECEB: dill_ctx_pollset_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C1A03: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ed0 is 432 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid read of size 8
==52447==    at 0x1000C0F7C: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C0E8B: dill_ctx_stack_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C1A13: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ec0 is 416 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid read of size 8
==52447==    at 0x1000C0F87: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C0E8B: dill_ctx_stack_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C1A13: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ec0 is 416 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid write of size 8
==52447==    at 0x1000C0F8E: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C0E8B: dill_ctx_stack_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C1A13: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ec0 is 416 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447== Invalid read of size 8
==52447==    at 0x1000BDDD0: dill_ctx_handle_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1000C1A23: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Address 0x100e33ea0 is 384 bytes inside a block of size 464 free'd
==52447==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==52447==    by 0x100000F08: main (threads2.c:34)
==52447==  Block was alloc'd at
==52447==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==52447==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==52447==    by 0x1000BEA00: dill_now (in /usr/local/lib/libdill.21.dylib)
==52447==    by 0x100000E83: main (threads2.c:31)
==52447==
==52447==
==52447== HEAP SUMMARY:
==52447==     in use at exit: 19,483 bytes in 166 blocks
==52447==   total heap usage: 189 allocs, 23 frees, 159,083 bytes allocated
==52447==
==52447== LEAK SUMMARY:
==52447==    definitely lost: 1,664 bytes in 26 blocks
==52447==    indirectly lost: 0 bytes in 0 blocks
==52447==      possibly lost: 72 bytes in 3 blocks
==52447==    still reachable: 200 bytes in 6 blocks
==52447==         suppressed: 17,547 bytes in 131 blocks
==52447== Rerun with --leak-check=full to see details of leaked memory
==52447==
==52447== For counts of detected and suppressed errors, rerun with: -v
==52447== ERROR SUMMARY: 9 errors from 9 contexts (suppressed: 4 from 4)
sustrik commented 6 years ago

Hm. That wasn't useful. First memory overwrite that valgrind is able to detect happens at the same place that crashes the process.

I've kept starring at the code but there's no obvious error there.

The question is what is _pthread_tsd_cleanup() trying to deallocate when it overwrites the byte in question. I've found source code for the function here:

https://opensource.apple.com/source/Libc/Libc-498/pthreads/pthread_tsd.c.auto.html

but there's no deallocation there. There's not even a single function call except for the user-supplied destructors. (Maybe I am looking at a wrong version of the code?)

Anyway, let's try one more thing. Let's check whether the destructor is called ONCE only. Apparently, if it was called twice it would try to deallocate the same memory chunk twice and we would see something similar to what we are seeing now.

Can you please add a breakpoint to dill_ctx_term (ctx.c:151) and check whether the code gets there once or twice?

progamer71 commented 6 years ago

The question is what is _pthread_tsd_cleanup() trying to deallocate when it overwrites the byte in question. I've found source code for the function here:

https://opensource.apple.com/source/Libc/Libc-498/pthreads/pthread_tsd.c.auto.html

but there's no deallocation there. There's not even a single function call except for the user-supplied destructors. (Maybe I am looking at a wrong version of the code?)

I found a list of libpthread here: https://opensource.apple.com/source/libpthread/ The latest is libpthread-301.30.1 (use in macOS 10.13.3)

But libpthread version in my system (macOS 10.13.6) is libpthread-301.50.1 /BuildRoot/Library/Caches/com.apple.xbs/Sources/libpthread/libpthread-301.50.1/src/pthread.c

* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x00007fff60d39940 libsystem_pthread.dylib`pthread_exit + 25
libsystem_pthread.dylib`pthread_exit:
->  0x7fff60d39940 <+25>: callq  0x7fff60d38e9a            ; _pthread_exit
    0x7fff60d39945 <+30>: leaq   0x3d7c(%rip), %rdi        ; "%s:%s:%u: pthread_exit() may only be called against threads created via pthread_create()"
    0x7fff60d3994c <+37>: leaq   0x3dce(%rip), %rsi        ; "/BuildRoot/Library/Caches/com.apple.xbs/Sources/libpthread/libpthread-301.50.1/src/pthread.c"
    0x7fff60d39953 <+44>: leaq   0x3e24(%rip), %rdx        ; "pthread_exit"
Target 0: (threads2) stopped.

The libpthread-301.30.1 is 3-4 years older than libpthread-301.50.1 running in my system (assume new release each year).

https://opensource.apple.com/source/libpthread/libpthread-301.30.1/src/pthread_tsd.c.auto.html I have read its source code. it finally clean up here.

_pthread_tsd_cleanup_key(pthread_t self, pthread_key_t key)
{
    void (*destructor)(void *);
    if (_pthread_key_get_destructor(key, &destructor)) {
        void **ptr = &self->tsd[key];
        void *value = *ptr;
        if (value) {
            *ptr = NULL;
            if (destructor) {
                destructor(value);
            }
        }
    }
}

The destructor(value) is a pointer to free(value), isn't it?

sustrik commented 6 years ago

Destructor is the function pointer to dill_ctx_term, value is the pointer to the memory to deallocate.

Can you check whether the destructor is called twice, as suggested above, just in case?

progamer71 commented 6 years ago

Destructor is the function pointer to dill_ctx_term, value is the pointer to the memory to deallocate.

Can you check whether the destructor is called twice, as suggested above, just in case?

I set breakpoint to dill_ctx_term and _pthread_tsd_cleanup. I run it 2 times. The first one run without error. The last one run with error. Both of them run dill_ctx_term once.

$ lldb ./threads2
(lldb) target create "./threads2"
Current executable set to './threads2' (x86_64).
(lldb) b dill_ctx_term
Breakpoint 1: where = libdill.21.dylib`dill_ctx_term + 12 at ctx.c:99, address = 0x00000000000079dc
(lldb) b _pthread_tsd_cleanup
Breakpoint 2: where = libsystem_pthread.dylib`_pthread_tsd_cleanup, address = 0x0000000000004f94
(lldb) r
Process 6562 launched: './threads2' (x86_64)
Process 6562 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x00007fff60d38f94 libsystem_pthread.dylib`_pthread_tsd_cleanup
libsystem_pthread.dylib`_pthread_tsd_cleanup:
->  0x7fff60d38f94 <+0>: pushq  %rbp
    0x7fff60d38f95 <+1>: movq   %rsp, %rbp
    0x7fff60d38f98 <+4>: pushq  %r15
    0x7fff60d38f9a <+6>: pushq  %r14
Target 0: (threads2) stopped.
(lldb) c
Process 6562 resuming
Process 6562 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000a89dc libdill.21.dylib`dill_ctx_term(ptr=0x0000000100102160) at ctx.c:99
   96   static void *dill_main = NULL;
   97
   98   static void dill_ctx_term(void *ptr) {
-> 99       struct dill_ctx *ctx = ptr;
   100      dill_ctx_fd_term(&ctx->fd);
   101      dill_ctx_pollset_term(&ctx->pollset);
   102      dill_ctx_stack_term(&ctx->stack);
Target 0: (threads2) stopped.
(lldb) c
Process 6562 resuming
Process 6562 exited with status = 0 (0x00000000)

(lldb) r
Process 6574 launched: './threads2' (x86_64)
Process 6574 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x00007fff60d38f94 libsystem_pthread.dylib`_pthread_tsd_cleanup
libsystem_pthread.dylib`_pthread_tsd_cleanup:
->  0x7fff60d38f94 <+0>: pushq  %rbp
    0x7fff60d38f95 <+1>: movq   %rsp, %rbp
    0x7fff60d38f98 <+4>: pushq  %r15
    0x7fff60d38f9a <+6>: pushq  %r14
Target 0: (threads2) stopped.
(lldb) c
Process 6574 resuming
Process 6574 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000a89dc libdill.21.dylib`dill_ctx_term(ptr=0x00000001003002f0) at ctx.c:99
   96   static void *dill_main = NULL;
   97
   98   static void dill_ctx_term(void *ptr) {
-> 99       struct dill_ctx *ctx = ptr;
   100      dill_ctx_fd_term(&ctx->fd);
   101      dill_ctx_pollset_term(&ctx->pollset);
   102      dill_ctx_stack_term(&ctx->stack);
Target 0: (threads2) stopped.
(lldb) c
Process 6574 resuming
Process 6574 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000a9ca7 libdill.21.dylib`dill_slist_pop(self=0x00000001003004b8) at slist.h:62
   59   /* Pop an item from the beginning of the list. */
   60   static inline struct dill_slist *dill_slist_pop(struct dill_slist *self) {
   61       struct dill_slist *item = self->next;
-> 62       self->next = item->next;
   63       return item;
   64   }
   65
Target 0: (threads2) stopped.
sustrik commented 6 years ago

Ok, that's our culprit then. The destructor is supposed to run once only.

And I think I know what the problem is now. Let me try to fix it.

sustrik commented 6 years ago

The way the code works now is based on how Linux works. Apparently, the destructor is called for every thread except for the main thread. Therefore, for main thread we register an atexit() function to do the cleanup.

My theory is that on OSX, both the destructor and the atexit function are called resulting in duplicit deallocation.

To test that, I've created the "osx" branch, which doesn't do atexit() at all. Can you please get it and run the test again? Do you see dill_ctx_term being called twice or once only?

sustrik commented 6 years ago

Ok, never mind. After a lot of experimenting by running stuff on Travis CI, I've gave up. Instead, I've made it use a slightly less efficient way of storing thread-local data on OSX. Get the latest version of GitHub and try it. It should work now.

progamer71 commented 6 years ago

Sorry I just wake up. My local time is GMT+7. This is the osx-branch result

./autogen.sh
./configure --enable-debug --enable-valgrind
make
make install
cd tests
clang -Wall -g -ldill -O0 -o threads2 threads2.c

$ ./threads2
Segmentation fault: 11

$ valgrind ./threads2
==28392== Memcheck, a memory error detector
==28392== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==28392== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==28392== Command: ./threads2
==28392==
==28392== Invalid read of size 8
==28392==    at 0x1000C2CBC: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C2C6B: dill_ctx_fd_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A3B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ee8 is 456 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid read of size 8
==28392==    at 0x1000C2CC7: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C2C6B: dill_ctx_fd_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A3B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ee8 is 456 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid write of size 8
==28392==    at 0x1000C2CCE: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C2C6B: dill_ctx_fd_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A3B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ee8 is 456 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid read of size 4
==28392==    at 0x1000BEDB0: dill_ctx_pollset_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A4B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ec8 is 424 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid read of size 8
==28392==    at 0x1000BEDBB: dill_ctx_pollset_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A4B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ed0 is 432 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid read of size 8
==28392==    at 0x1000C104C: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C0F5B: dill_ctx_stack_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A5B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ec0 is 416 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid read of size 8
==28392==    at 0x1000C1057: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C0F5B: dill_ctx_stack_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A5B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ec0 is 416 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid write of size 8
==28392==    at 0x1000C105E: dill_slist_pop (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C0F5B: dill_ctx_stack_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A5B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ec0 is 416 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392== Invalid read of size 8
==28392==    at 0x1000BDEA0: dill_ctx_handle_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A6B: dill_ctx_term_ (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1000C1A0C: dill_ctx_term (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Address 0x100e33ea0 is 384 bytes inside a block of size 464 free'd
==28392==    at 0x1000AC9AB: free (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x1006AF162: _pthread_tsd_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AEEE8: _pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x1006AF944: pthread_exit (in /usr/lib/system/libsystem_pthread.dylib)
==28392==    by 0x100000F08: main (threads2.c:34)
==28392==  Block was alloc'd at
==28392==    at 0x1000AC5C6: malloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==28392==    by 0x10026A969: tlv_allocate_and_initialize_for_key (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x10026B0EB: tlv_get_addr (in /usr/lib/system/libdyld.dylib)
==28392==    by 0x1000BEAD0: dill_now (in /usr/local/lib/libdill.21.dylib)
==28392==    by 0x100000E83: main (threads2.c:31)
==28392==
==28392==
==28392== HEAP SUMMARY:
==28392==     in use at exit: 19,483 bytes in 166 blocks
==28392==   total heap usage: 189 allocs, 23 frees, 159,083 bytes allocated
==28392==
==28392== LEAK SUMMARY:
==28392==    definitely lost: 1,664 bytes in 26 blocks
==28392==    indirectly lost: 0 bytes in 0 blocks
==28392==      possibly lost: 72 bytes in 3 blocks
==28392==    still reachable: 200 bytes in 6 blocks
==28392==         suppressed: 17,547 bytes in 131 blocks
==28392== Rerun with --leak-check=full to see details of leaked memory
==28392==
==28392== For counts of detected and suppressed errors, rerun with: -v
==28392== ERROR SUMMARY: 9 errors from 9 contexts (suppressed: 4 from 4)

$ lldb ./threads2
(lldb) target create "./threads2"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 52, in <module>
    import weakref
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/weakref.py", line 14, in <module>
    from _weakref import (
ImportError: cannot import name _remove_dead_weakref
Current executable set to './threads2' (x86_64).
(lldb) r
Process 29049 launched: './threads2' (x86_64)
Process 29049 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000a9cc7 libdill.21.dylib`dill_slist_pop(self=0x00000001002004b8) at slist.h:62
   59   /* Pop an item from the beginning of the list. */
   60   static inline struct dill_slist *dill_slist_pop(struct dill_slist *self) {
   61       struct dill_slist *item = self->next;
-> 62       self->next = item->next;
   63       return item;
   64   }
   65
Target 0: (threads2) stopped.
(lldb) p self
(dill_slist *) $0 = 0x00000001002004b8
(lldb) p self->next
(dill_slist *) $1 = 0x002d0001002004b8
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001000a9cc7 libdill.21.dylib`dill_slist_pop(self=0x00000001002004b8) at slist.h:62
    frame #1: 0x00000001000a9c6c libdill.21.dylib`dill_ctx_fd_term(ctx=0x00000001002004b0) at fd.c:52
    frame #2: 0x00000001000a8a3c libdill.21.dylib`dill_ctx_term_(ctx=0x00000001002002f0) at ctx.c:42
    frame #3: 0x00000001000a8a0d libdill.21.dylib`dill_ctx_term(ptr=0x00000001002002f0) at ctx.c:85
    frame #4: 0x00007fff6756a163 libsystem_pthread.dylib`_pthread_tsd_cleanup + 463
    frame #5: 0x00007fff67569ee9 libsystem_pthread.dylib`_pthread_exit + 79
    frame #6: 0x00007fff6756a945 libsystem_pthread.dylib`pthread_exit + 30
    frame #7: 0x0000000100000f09 threads2`main at threads2.c:34
    frame #8: 0x00007fff67250015 libdyld.dylib`start + 1
    frame #9: 0x00007fff67250015 libdyld.dylib`start + 1
progamer71 commented 6 years ago

Hi, I believe that threads2 do not leak memory.

This is my experiment. I have compile a perfect no memory leak program.

#include <stdio.h>

int main(void) {
    printf("hello\n");
    return 0;
}

run in valgrind

$ valgrind ./hello
==29271== Memcheck, a memory error detector
==29271== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==29271== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==29271== Command: ./hello
==29271==
hello
==29271==
==29271== HEAP SUMMARY:
==29271==     in use at exit: 23,579 bytes in 167 blocks
==29271==   total heap usage: 188 allocs, 21 frees, 32,027 bytes allocated
==29271==
==29271== LEAK SUMMARY:
==29271==    definitely lost: 1,664 bytes in 26 blocks
==29271==    indirectly lost: 0 bytes in 0 blocks
==29271==      possibly lost: 72 bytes in 3 blocks
==29271==    still reachable: 200 bytes in 6 blocks
==29271==         suppressed: 21,643 bytes in 132 blocks
==29271== Rerun with --leak-check=full to see details of leaked memory
==29271==
==29271== For counts of detected and suppressed errors, rerun with: -v
==29271== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)

The memory leaked report by valgrind likely happen in kernel, not from user space code.

$ valgrind --leak-check=full ./hello
==29417== Memcheck, a memory error detector
==29417== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==29417== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==29417== Command: ./hello
==29417==
hello
==29417==
==29417== HEAP SUMMARY:
==29417==     in use at exit: 23,579 bytes in 167 blocks
==29417==   total heap usage: 188 allocs, 21 frees, 32,027 bytes allocated
==29417==
==29417== 64 bytes in 1 blocks are definitely lost in loss record 24 of 46
==29417==    at 0x1000ACC32: calloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==29417==    by 0x1007A2BA4: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C5A: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C5A: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A1363: _read_images (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x10079FAC4: map_images_nolock (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007B27DA: objc_object::sidetable_retainCount() (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x100007C64: dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*), bool, bool) (in /usr/lib/dyld)
==29417==    by 0x100007E39: dyld::registerObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*)) (in /usr/lib/dyld)
==29417==    by 0x10026A71D: _dyld_objc_notify_register (in /usr/lib/system/libdyld.dylib)
==29417==    by 0x10079F075: _objc_init (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1001F4B34: _os_object_init (in /usr/lib/system/libdispatch.dylib)
==29417==
==29417== 64 bytes in 1 blocks are definitely lost in loss record 25 of 46
==29417==    at 0x1000ACC32: calloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==29417==    by 0x1007A2BA4: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C72: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C5A: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C5A: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A1363: _read_images (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x10079FAC4: map_images_nolock (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007B27DA: objc_object::sidetable_retainCount() (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x100007C64: dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*), bool, bool) (in /usr/lib/dyld)
==29417==    by 0x100007E39: dyld::registerObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*)) (in /usr/lib/dyld)
==29417==    by 0x10026A71D: _dyld_objc_notify_register (in /usr/lib/system/libdyld.dylib)
==29417==    by 0x10079F075: _objc_init (in /usr/lib/libobjc.A.dylib)
==29417==
==29417== 72 bytes in 3 blocks are possibly lost in loss record 26 of 46
==29417==    at 0x1000ACC32: calloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==29417==    by 0x10079F7E2: map_images_nolock (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007B27DA: objc_object::sidetable_retainCount() (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x100007C64: dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*), bool, bool) (in /usr/lib/dyld)
==29417==    by 0x100007E39: dyld::registerObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*)) (in /usr/lib/dyld)
==29417==    by 0x10026A71D: _dyld_objc_notify_register (in /usr/lib/system/libdyld.dylib)
==29417==    by 0x10079F075: _objc_init (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1001F4B34: _os_object_init (in /usr/lib/system/libdispatch.dylib)
==29417==    by 0x1001F4B1B: libdispatch_init (in /usr/lib/system/libdispatch.dylib)
==29417==    by 0x1001039C2: libSystem_initializer (in /usr/lib/libSystem.B.dylib)
==29417==    by 0x100019AC5: ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) (in /usr/lib/dyld)
==29417==    by 0x100019CF5: ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) (in /usr/lib/dyld)
==29417==
==29417== 128 bytes in 2 blocks are definitely lost in loss record 29 of 46
==29417==    at 0x1000ACC32: calloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==29417==    by 0x1007A2BA4: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C5A: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A1363: _read_images (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x10079FAC4: map_images_nolock (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007B27DA: objc_object::sidetable_retainCount() (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x100007C64: dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*), bool, bool) (in /usr/lib/dyld)
==29417==    by 0x100007E39: dyld::registerObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*)) (in /usr/lib/dyld)
==29417==    by 0x10026A71D: _dyld_objc_notify_register (in /usr/lib/system/libdyld.dylib)
==29417==    by 0x10079F075: _objc_init (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1001F4B34: _os_object_init (in /usr/lib/system/libdispatch.dylib)
==29417==    by 0x1001F4B1B: libdispatch_init (in /usr/lib/system/libdispatch.dylib)
==29417==
==29417== 128 bytes in 2 blocks are definitely lost in loss record 30 of 46
==29417==    at 0x1000ACC32: calloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==29417==    by 0x1007A2BA4: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C72: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C5A: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A1363: _read_images (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x10079FAC4: map_images_nolock (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007B27DA: objc_object::sidetable_retainCount() (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x100007C64: dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*), bool, bool) (in /usr/lib/dyld)
==29417==    by 0x100007E39: dyld::registerObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*)) (in /usr/lib/dyld)
==29417==    by 0x10026A71D: _dyld_objc_notify_register (in /usr/lib/system/libdyld.dylib)
==29417==    by 0x10079F075: _objc_init (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1001F4B34: _os_object_init (in /usr/lib/system/libdispatch.dylib)
==29417==
==29417== 1,280 bytes in 20 blocks are definitely lost in loss record 40 of 46
==29417==    at 0x1000ACC32: calloc (in /usr/local/Cellar/valgrind/HEAD-0375954/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==29417==    by 0x1007A2BA4: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A2C72: realizeClass(objc_class*) (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007A1363: _read_images (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x10079FAC4: map_images_nolock (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1007B27DA: objc_object::sidetable_retainCount() (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x100007C64: dyld::notifyBatchPartial(dyld_image_states, bool, char const* (*)(dyld_image_states, unsigned int, dyld_image_info const*), bool, bool) (in /usr/lib/dyld)
==29417==    by 0x100007E39: dyld::registerObjCNotifiers(void (*)(unsigned int, char const* const*, mach_header const* const*), void (*)(char const*, mach_header const*), void (*)(char const*, mach_header const*)) (in /usr/lib/dyld)
==29417==    by 0x10026A71D: _dyld_objc_notify_register (in /usr/lib/system/libdyld.dylib)
==29417==    by 0x10079F075: _objc_init (in /usr/lib/libobjc.A.dylib)
==29417==    by 0x1001F4B34: _os_object_init (in /usr/lib/system/libdispatch.dylib)
==29417==    by 0x1001F4B1B: libdispatch_init (in /usr/lib/system/libdispatch.dylib)
==29417==
==29417== LEAK SUMMARY:
==29417==    definitely lost: 1,664 bytes in 26 blocks
==29417==    indirectly lost: 0 bytes in 0 blocks
==29417==      possibly lost: 72 bytes in 3 blocks
==29417==    still reachable: 200 bytes in 6 blocks
==29417==         suppressed: 21,643 bytes in 132 blocks
==29417== Reachable blocks (those to which a pointer was found) are not shown.
==29417== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==29417==
==29417== For counts of detected and suppressed errors, rerun with: -v
==29417== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 11 from 11)

if threads2 do not report more value than following number, there is no memory leaked.

==29417==    definitely lost: 1,664 bytes in 26 blocks
==29417==    indirectly lost: 0 bytes in 0 blocks
==29417==      possibly lost: 72 bytes in 3 blocks
==29417==    still reachable: 200 bytes in 6 blocks
sustrik commented 6 years ago

Now that's weird because travis OSX tests succeed (except for one unrelated test): https://travis-ci.org/sustrik/libdill/builds/431309278

Let's double check whether we are looking at the same thing: branch: master version: eaf71e5542a33fadd1aeacc1e43d43affec6cd61

progamer71 commented 6 years ago

Congratulation. There is no more error with branch: master version: eaf71e5

$ for i in {1..100}; do ./threads2; done

$ lldb ./threads2
(lldb) target create "./threads2"
Traceback (most recent call last):
Current executable set to './threads2' (x86_64).
(lldb) r
Process 21671 launched: './threads2' (x86_64)
Process 21671 exited with status = 0 (0x00000000)
(lldb) r
Process 21675 launched: './threads2' (x86_64)
Process 21675 exited with status = 0 (0x00000000)
(lldb) r
Process 21678 launched: './threads2' (x86_64)
Process 21678 exited with status = 0 (0x00000000)
(lldb) r
Process 21681 launched: './threads2' (x86_64)
Process 21681 exited with status = 0 (0x00000000)
(lldb) r
Process 21684 launched: './threads2' (x86_64)
Process 21684 exited with status = 0 (0x00000000)
(lldb) r
Process 21687 launched: './threads2' (x86_64)
Process 21687 exited with status = 0 (0x00000000)
(lldb) r
Process 21690 launched: './threads2' (x86_64)
Process 21690 exited with status = 0 (0x00000000)
(lldb) r
Process 21693 launched: './threads2' (x86_64)
Process 21693 exited with status = 0 (0x00000000)
(lldb) r
Process 21698 launched: './threads2' (x86_64)
Process 21698 exited with status = 0 (0x00000000)
(lldb) r
Process 21701 launched: './threads2' (x86_64)
Process 21701 exited with status = 0 (0x00000000)
(lldb) r
Process 21704 launched: './threads2' (x86_64)
Process 21704 exited with status = 0 (0x00000000)
(lldb) r
Process 21708 launched: './threads2' (x86_64)
Process 21708 exited with status = 0 (0x00000000)
(lldb) r
Process 21711 launched: './threads2' (x86_64)
Process 21711 exited with status = 0 (0x00000000)
(lldb) r
Process 21714 launched: './threads2' (x86_64)
Process 21714 exited with status = 0 (0x00000000)
(lldb) ^D

$ valgrind ./threads2
==21721== Memcheck, a memory error detector
==21721== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21721== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==21721== Command: ./threads2
==21721==
==21721==
==21721== HEAP SUMMARY:
==21721==     in use at exit: 19,355 bytes in 165 blocks
==21721==   total heap usage: 188 allocs, 23 frees, 158,955 bytes allocated
==21721==
==21721== LEAK SUMMARY:
==21721==    definitely lost: 1,664 bytes in 26 blocks
==21721==    indirectly lost: 0 bytes in 0 blocks
==21721==      possibly lost: 72 bytes in 3 blocks
==21721==    still reachable: 200 bytes in 6 blocks
==21721==         suppressed: 17,419 bytes in 130 blocks
==21721== Rerun with --leak-check=full to see details of leaked memory
==21721==
==21721== For counts of detected and suppressed errors, rerun with: -v
==21721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
sustrik commented 6 years ago

Great! If you are happy you can close the issue.

Btw, are you seeing tests/go5 failing (I sometimes see it in Travis). If so, can you get a stack trace?

progamer71 commented 6 years ago

Yes, I also see tests/go5 fail. I will close this one. I open a new one.