helgrind/tests/tls_threads is failing

paulfloyd commented 4 years ago

Not sure about the feasibility of fixing this one.

This test was added along with a hack on Linux that reads the stack_cache_actsize variable when loading the thread library. This data is then used to avoid tls-related false positives in Helgrind.

Doing something for FreeBSD will require some knowledge of thread memory layout. thr_stack.c looks like a good place to start, it does have some nice ASCII-art stack drawings at least.

Here is the code / test submit

https://github.com/paulfloyd/freebsd_valgrind/commit/8e605f14c58a740d05f888508e1e4cf0cddf70b8

paulfloyd commented 2 years ago

The DRD version of this also fails intermittently (issue #103)

paulfloyd commented 2 years ago

More on how TLS works on Linux https://chao-tic.github.io/blog/2018/12/25/tls and also the blog post referenced there.

paulfloyd commented 2 years ago

A few words in the problem itself.

main() creates 3 threads, level1() Each level1() thread creates 10 more, level2()

level2() just calls 2 functions that increment respectively a TLS static and global.

as always there is no guarantee with the scheduling the first level2() threads finish before the last of them are created that results in their stacks being recycled

On Linux the hack is to set the stack_cache_actsize to a high value so that whenever a thread exits the stack gets freed rather than being added to the cache. Subsequent threads always get a new stack.

Without the option

==8178== ERROR SUMMARY: 234 errors from 20 contexts (suppressed: 829 from 144)

FreeBSD has two stack caches, one for regular sized and one for user defined sized. Already that slightly complicates matters.

_thr_stack_alloc

looks in these caches (linked lists, just looks at first for default sized, runs over list for user sized)

freeing seems to be done with GC (?)

thread data doesn't look like it gets freed, just the use count set to zero

then there is

struct pthread *
_thr_alloc(struct pthread *curthread)
{
    struct pthread  *thread = NULL;
    struct tcb  *tcb;

    if (curthread != NULL) {
        if (GC_NEEDED())
            _thr_gc(curthread);

GC_NEEDED is _gc_count > 5 or st like that.

stuff like _gc_count is marked hidden.

paulfloyd commented 2 years ago

If I can find a way to hack

free_thread_count

so that it is > 100 at startup

then thr_exit will delete the stack rather than cache it

which might fix this

paulfloyd commented 2 years ago

I tried the above with gdb and fiddling with _gc_count and free_thread_count but that didn't seem to help

paulfloyd commented 2 years ago

Some more info with a debug build of libc

==40355== Possible data race during read of size 8 at 0x58E7130 by thread #12
==40355== Locks held: none
==40355==    at 0x201DE4: local_false_positive (tls_threads.c:28)
==40355==    by 0x201DB4: level2 (tls_threads.c:48)
==40355==    by 0x485A8D6: mythread_wrapper (hg_intercepts.c:406)
==40355==    by 0x486D959: ??? (lib/libthr/thread/thr_create.c:292)
==40355== 
==40355== This conflicts with a previous write of size 8 by thread #8
==40355== Locks held: none
==40355==    at 0x400CDE6: ??? (in /libexec/ld-elf.so.1)
==40355==    by 0x400D1D8: _rtld_allocate_tls (in /libexec/ld-elf.so.1)
==40355==    by 0x486D9C6: _tcb_ctor (lib/libthr/thread/thr_ctrdtr.c:45)
==40355==    by 0x487031A: _thr_alloc (lib/libthr/thread/thr_list.c:173)
==40355==    by 0x486D0EE: pthread_create (lib/libthr/thread/thr_create.c:80)
==40355==    by 0x48539D4: pthread_create_WRK (hg_intercepts.c:445)
==40355==    by 0x4853888: pthread_create (hg_intercepts.c:484)
==40355==    by 0x201D46: level1 (tls_threads.c:69)
==40355==  Address 0x58e7130 is in a rw- anonymous segment

Repeating a bit, but here there's a hazard on the TLS of thread 12 which seems top be recycling the stack and TLS of thread 8.

thjr_list.c:

    if (curthread != NULL) {
        THR_LOCK_ACQUIRE(curthread, &tcb_lock);
        tcb = _tcb_ctor(thread, 0 /* not initial tls */);
        THR_LOCK_RELEASE(curthread, &tcb_lock);
    } else {
        tcb = _tcb_ctor(thread, 1 /* initial tls */);
    }

thr_ctrdtr.c:

struct tcb *
_tcb_ctor(struct pthread *thread, int initial)
{
    struct tcb *tcb;

    if (initial)
        tcb = _tcb_get();
    else
        tcb = _rtld_allocate_tls(NULL, sizeof(struct tcb), 16);
    if (tcb)
        tcb->tcb_thread = thread;
    return (tcb);
}

paulfloyd commented 2 years ago

_rtld_allocate_tls just calls

ret = allocate_tls(globallist_curr(TAILQ_FIRST(&obj_list)), oldtls,tcbsize, tcbalign);

The first error is

void *
allocate_tls(Obj_Entry *objs, void *oldtls, size_t tcbsize, size_t tcbalign)
{
    Obj_Entry *obj;
    size_t size, ralign;
    char *tls;
    Elf_Addr *dtv, *olddtv;
    Elf_Addr segbase, oldsegbase, addr;
    size_t i;

    ralign = tcbalign;
    if (tls_static_max_align > ralign)
        ralign = tls_static_max_align;
    size = roundup(tls_static_space, ralign) + roundup(tcbsize, ralign);

    assert(tcbsize >= 2*sizeof(Elf_Addr));
    tls = malloc_aligned(size, ralign, 0 /* XXX */);
    dtv = xcalloc(tls_max_index + 2, sizeof(Elf_Addr));

    segbase = (Elf_Addr)(tls + roundup(tls_static_space, ralign));
    ((Elf_Addr*)segbase)[0] = segbase; // ERROR HERE

==40357== Possible data race during read of size 8 at 0x58EA130 by thread #13
==40357== Locks held: none
==40357==    at 0x201DE4: local_false_positive (tls_threads.c:28)
==40357==    by 0x201DB4: level2 (tls_threads.c:48)
==40357==    by 0x485A8D6: mythread_wrapper (hg_intercepts.c:406)
==40357==    by 0x486D8AA: ??? (lib/libthr/thread/thr_create.c:292)
==40357== 
==40357== This conflicts with a previous write of size 8 by thread #2
==40357== Locks held: none
==40357==    at 0x400D016: allocate_tls (libexec/rtld-elf/rtld.c:5058)
==40357==    by 0x400D408: _rtld_allocate_tls (libexec/rtld-elf/rtld.c:5231)
==40357==    by 0x486D8F6: _tcb_ctor (lib/libthr/thread/thr_ctrdtr.c:45)
==40357==    by 0x487011B: _thr_alloc (lib/libthr/thread/thr_list.c:173)
==40357==    by 0x486D02E: pthread_create (lib/libthr/thread/thr_create.c:80)
==40357==    by 0x48539D4: pthread_create_WRK (hg_intercepts.c:445)
==40357==    by 0x4853888: pthread_create (hg_intercepts.c:484)
==40357==    by 0x201D46: level1 (tls_threads.c:69)
==40357==  Address 0x58ea130 is in a rw- anonymous segment

So it looks like thread 2 allocated this TLS, ended, then it got recycled and used for thread13

Why is thread13 not seeing that address as being zero intiialized?

paulfloyd commented 1 year ago

Not that it helps much here, but on Linux GLIBC the stack cache has changed and I've implemented a different technique that uses tunables to set the cache size to 0. Not yet pushed, waiting for Mark W to take a look.

paulfloyd commented 1 year ago

I'v disabled this test on FreeBSD.

paulfloyd / freebsd_valgrind

helgrind/tests/tls_threads is failing #113