plasma-umass / coz

Coz: Causal Profiling
Other
4.04k stars 160 forks source link

Segmentation Fault, jemalloc interaction? #98

Open mjmeehan opened 6 years ago

mjmeehan commented 6 years ago

It appears Coz interacts poorly with jemalloc, causing the stack to explode. This makes it hard to use on Rust programs.

To reproduce: There's probably an easier way, but...

$ git clone https://github.com/TeXitoi/benchmarksgame-rs

...install deps as required

$ make $ ulimit -c unlimited # so you get a core $ bin/spectralnorm 1.274219991 $ coz run --- ./bin/spectralnorm $ gdb ./bin/spectralnorm -c core $ bt

0 0x00007efefa94857d in pthread_mutex_lock (mutex=0x561e8f1d40f0 ) at libcoz.cpp:268

1 0x0000561e8ef94f63 in je_malloc_mutex_lock (tsdn=, mutex=)

at /checkout/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/mutex.h:101

2 malloc_init_hard () at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1486

3 malloc_init () at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:317

4 ialloc_body (zero=255, slow_path=255, size=, tsdn=,

usize=<optimized out>) at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1583

5 calloc (num=, size=32)

at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1824

6 0x00007efefa72e7e5 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2

7 0x00007efefa72e166 in dlsym () from /lib/x86_64-linux-gnu/libdl.so.2

8 0x00007efefa94f8d4 in resolve_pthread_mutex_lock (mutex=0x561e8f1d40f0 )

at real.cpp:169

9 0x00007efefa94857f in pthread_mutex_lock (mutex=0x561e8f1d40f0 ) at libcoz.cpp:268

10 0x0000561e8ef94f63 in je_malloc_mutex_lock (tsdn=, mutex=)

at /checkout/src/liballoc_jemalloc/../jemalloc/include/jemalloc/internal/mutex.h:101

11 malloc_init_hard () at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1486

12 malloc_init () at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:317

13 ialloc_body (zero=255, slow_path=255, size=, tsdn=,

usize=<optimized out>) at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1583

14 calloc (num=, size=32)

at /checkout/src/liballoc_jemalloc/../jemalloc/src/jemalloc.c:1824

15 0x00007efefa72e7e5 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2

16 0x00007efefa72e166 in dlsym () from /lib/x86_64-linux-gnu/libdl.so.2

17 0x00007efefa94f8d4 in resolve_pthread_mutex_lock (mutex=0x561e8f1d40f0 )

at real.cpp:169

...

jadbox commented 4 years ago

I don't know much about coz yet, but @mjmeehan how did you add program markers to Rust for measuring component segments? Or where you not able to get that far?

sstadick commented 4 years ago

I have the same question as jadbox, is it possible to add markers to a rust program?

llogiq commented 4 years ago

https://github.com/alexcrichton/coz-rs

bobby-stripe commented 4 years ago

if someone can reproduce this, can you post the bottom few dozen frames of the stack?

The problem here is recursion between the dynamic linker, jemalloc and Coz's interposition code: the dynamic linker is trying to resolve a symbol, it calls calloc, calloc needs pthread_mutex_lock, Coz normally interposes on pthread_mutex_lock, so the first time it calls, it needs to ask the dynamic linker where the libc version of the function is. This brings us to our infinite loop where we quickly blow the stack and crash.

Knowing what the top of the stack looks like might help figure out what sort of workaround is needed

matthewfl commented 4 years ago

I think I am having the same issue when using coz with a C program when jemalloc is linked. For me, simply adding -ljemalloc to any of the included demo programs in the benchmarks folder is enough to cause it to crash during starting up.

kormang commented 2 years ago

This can be closed as a duplicate of more general issue #176