plasma-umass / coz

Coz: Causal Profiling
Other
4.04k stars 160 forks source link

coz fails silently on my rust program #180

Open asg0451 opened 3 years ago

asg0451 commented 3 years ago

When i run my program with coz, it exits without doing anything:

$ cargo b --release && coz run --- ./target/release/cov-breaker >/dev/null

[libcoz.cpp:100] bootstrapping coz
[libcoz.cpp:128] Including MAIN, which is /home/miles/rust/cov-breaker/target/release/cov-breaker
[inspect.cpp:325] /usr/lib/coz-profiler/libcoz.so is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/ld-2.31.so is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libm-2.31.so is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libdl-2.31.so is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libpthread-2.31.so is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libdwarf++.so.0 is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libelf++.so.0 is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libc-2.31.so is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 is not in scope
[inspect.cpp:325] /usr/lib/x86_64-linux-gnu/librt-2.31.so is not in scope
[inspect.cpp:509] Included source file /home/miles/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/cmp.rs
[inspect.cpp:509] Included source file /home/miles/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs
... like a thousand of these ...
[inspect.cpp:317] Including lines from executable /home/miles/rust/cov-breaker/target/release/cov-breaker
[profiler.cpp:75] Starting profiler thread
(exits immediately)
$ echo $?
245

(somewhat) minimal repro: https://github.com/asg0451/rust-coz-breaker

Anecdotally, coz worked fine in my actual program, until I added rayon & channels & parallelism.

Ubuntu 20.04, coz from apt (I don't see a --version flag)

Requires cargo to build, as with all (most) Rust programs

EDIT: it seems to me that rayon is the issue. replacing

lines.par_iter().for_each_with(tx, |tx, &j| {

with

lines.iter().for_each(|&j| {

(that is - switching from rayon to a regular single-threaded iterator) results in coz working

EDIT 2: turns out that removing rayon still results in an abrupt exit with code 245, it's just not immediate..

colinwm commented 3 years ago

+1, I see this too.

[profiler.cpp:75] Starting profiler thread --> abruptly exit code 245

rafibaum commented 3 years ago

I had this same issue, but was able to fix it with the coz::thread_init() fix mentioned in the README.

antoyo commented 2 years ago

I'm not sure if it's the same issue, but I had a similar issue that would trigger the following output:

[libcoz/profiler.h:123] Thread state not found
Aborted!
  0: /usr/bin/../lib64/libcoz.so(_ZN8profiler8on_errorEiP9siginfo_tPv+0x69) [0x7f45682e84a9]
  1: /usr/lib/libc.so.6(+0x42560) [0x7f45680c4560]
  2: /usr/lib/libc.so.6(+0x8f34c) [0x7f456811134c]
  3: /usr/lib/libc.so.6(raise+0x18) [0x7f45680c44b8]
  4: /usr/lib/libc.so.6(abort+0xd3) [0x7f45680ae534]
  5: /usr/bin/../lib64/libcoz.so(pthread_create+0x199) [0x7f45682e5479]
  6: /usr/bin/../lib64/libcoz.so(_ZN8profiler7startupERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP4lineib+0x213) [0x7f45682e9553]
  7: /usr/bin/../lib64/libcoz.so(_Z8init_cozv+0xdbb) [0x7f45682e491b]
  8: /usr/bin/../lib64/libcoz.so(+0x185dc) [0x7f45682e55dc]
  9: /usr/lib/libc.so.6(+0x2d310) [0x7f45680af310]
  10: /usr/lib/libc.so.6(__libc_start_main+0x81) [0x7f45680af3c1]
  11: target/release/examples/toy(+0x7b85) [0x5619ab95db85]

And I remember having seen this 245 error code in strace or something. I debugged it and it turned out this happens on Rust programs not linked with pthread. I'm not sure how Rust programs using thread work, but it seems sometimes it does not link pthread (it's not shown in ldd).

The solution I found was to remove | RTLD_NOLOAD from this line.

@llogiq Pinging you since you wrote an article about this issue.

kalcutter commented 2 years ago

I was having a similar issue:

[profiler.cpp:75] Starting profiler thread                                                
[libcoz.cpp:96] init_coz in progress, do not recurse
[profiler.h:123] Thread state not found

I could fix the issue by removing the RTLD_NOLOAD flag as also noted by @antoyo. Another workaround that doesn't involve recompiling is using LD_PRELOAD like LD_PRELOAD=/usr/lib/libpthread.so.0.

hugolm84 commented 2 years ago

I'm not sure if it's the same issue, but I had a similar issue that would trigger the following output:

[libcoz/profiler.h:123] Thread state not found
Aborted!
  0: /usr/bin/../lib64/libcoz.so(_ZN8profiler8on_errorEiP9siginfo_tPv+0x69) [0x7f45682e84a9]
  1: /usr/lib/libc.so.6(+0x42560) [0x7f45680c4560]
  2: /usr/lib/libc.so.6(+0x8f34c) [0x7f456811134c]
  3: /usr/lib/libc.so.6(raise+0x18) [0x7f45680c44b8]
  4: /usr/lib/libc.so.6(abort+0xd3) [0x7f45680ae534]
  5: /usr/bin/../lib64/libcoz.so(pthread_create+0x199) [0x7f45682e5479]
  6: /usr/bin/../lib64/libcoz.so(_ZN8profiler7startupERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP4lineib+0x213) [0x7f45682e9553]
  7: /usr/bin/../lib64/libcoz.so(_Z8init_cozv+0xdbb) [0x7f45682e491b]
  8: /usr/bin/../lib64/libcoz.so(+0x185dc) [0x7f45682e55dc]
  9: /usr/lib/libc.so.6(+0x2d310) [0x7f45680af310]
  10: /usr/lib/libc.so.6(__libc_start_main+0x81) [0x7f45680af3c1]
  11: target/release/examples/toy(+0x7b85) [0x5619ab95db85]

And I remember having seen this 245 error code in strace or something. I debugged it and it turned out this happens on Rust programs not linked with pthread. I'm not sure how Rust programs using thread work, but it seems sometimes it does not link pthread (it's not shown in ldd).

The solution I found was to remove | RTLD_NOLOAD from this line.

@llogiq Pinging you since you wrote an article about this issue.

Trying out coz for the first time, this solved my issue with Thread state not found running the benchmark tests. Thanks :D

viluon commented 1 year ago

coz fails even more silently for my Rust program, quitting with the 245 error code with no output whatsoever.

EDIT: What I've found out so far:

0mhu commented 1 year ago

I have a similar problem with a C executable:

$ coz run --- ./test
[ some lines I cut out]
[inspect.cpp:316] Including lines from executable /tmp/build/test
[profiler.cpp:75] Starting profiler thread
[libcoz.cpp:96] init_coz in progress, do not recurse
[profiler.h:123] Thread state not found
Aborted!
  0: /usr/bin/../lib64/coz-profiler/libcoz.so(_ZN8profiler8on_errorEiP9siginfo_tPv+0x6c) [0x7fbde5f040ec]
  1: /usr/lib/libc.so.6(+0x389e0) [0x7fbde5b739e0]
  2: /usr/lib/libc.so.6(+0x8864c) [0x7fbde5bc364c]
  3: /usr/lib/libc.so.6(gsignal+0x18) [0x7fbde5b73938]
  4: /usr/lib/libc.so.6(abort+0xd7) [0x7fbde5b5d53d]
  5: /usr/bin/../lib64/coz-profiler/libcoz.so(pthread_create+0x18e) [0x7fbde5f015fe]
  6: /usr/bin/../lib64/coz-profiler/libcoz.so(_ZN8profiler7startupERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP4lineib+0x20a) [0x7fbde5f0498a]
  7: /usr/bin/../lib64/coz-profiler/libcoz.so(_Z8init_cozv+0xf68) [0x7fbde5f00aa8]
  8: /usr/bin/../lib64/coz-profiler/libcoz.so(+0x1944b) [0x7fbde5f0144b]
  9: /usr/lib/libc.so.6(+0x23290) [0x7fbde5b5e290]
  10: /usr/lib/libc.so.6(__libc_start_main+0x8a) [0x7fbde5b5e34a]
  11: ./test(+0x85d5) [0x563d593a85d5]

I couldn't get coz to work on anything until now. My compile parameters are: -Wall -Wextra -Wold-style-declaration -Wuninitialized -Wmaybe-uninitialized -Wunused-parameter -g1 -gdwarf-3

Are there any suggestions on how to get this working with a plain C program?