koute / bytehound

A memory profiler for Linux.
Other
4.44k stars 190 forks source link

Segmentation fault when running with LD_PRELOAD #93

Open freemanliu opened 1 year ago

freemanliu commented 1 year ago

Verson of bytehound-preload is "0.10.0". Happens for both debug of release build of bytehound-preload.

How to debug this?

koute commented 1 year ago

The first step should be enabling core dumps, getting it to dump one, and then open it in GDB and see where exactly it crashed.

freemanliu commented 1 year ago

Hi, Koute,

Thanks for your quick response. Here are the stack trace of all the threads. Seems the clone caused the problem(thread 1). Can you please take a look? Let me know if further info is needed.

Thread 9 (Thread 0x7fd3e5a7d000 (LWP 301271)): warning: Section `.reg-xstate/301271' in core file too small.

0 0x00007fd49d89515e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

1 0x00007fd49d892768 in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

2 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

Thread 8 (Thread 0x7fd3e627e000 (LWP 301270)): warning: Section `.reg-xstate/301270' in core file too small.

0 0x00007fd49d89515e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

1 0x00007fd49d892768 in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

2 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

Thread 7 (Thread 0x7fd3f5030000 (LWP 306248)): warning: Section `.reg-xstate/306248' in core file too small.

0 0x00007fd49d89515e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

1 0x00007fd49d892768 in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

2 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

Thread 6 (Thread 0x7fd47e0e9000 (LWP 301264)): warning: Section `.reg-xstate/301264' in core file too small.

0 0x00007fd49d60b73d in syscall () from /lib/x86_64-linux-gnu/libc.so.6

1 0x0000558df5f54725 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62

2 0x0000558df5f49b39 in std::sys_common::thread_parker::futex::Parker::park () at library/std/src/sys_common/thread_parker/futex.rs:52

3 std::thread::park () at library/std/src/thread/mod.rs:942

4 0x0000558df5f4fedf in std::sync::mpsc::blocking::WaitToken::wait () at library/std/src/sync/mpsc/blocking.rs:67

5 0x0000558df5eaca6c in std::sync::mpsc::oneshot::Packet::recv () at library/std/src/sys_common/thread_info.rs:28

6 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

Thread 5 (Thread 0x7fd3e6a7f000 (LWP 301269)): warning: Section `.reg-xstate/301269' in core file too small.

0 0x00007fd49d89515e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

1 0x00007fd49d892768 in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1

2 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

Thread 4 (Thread 0x7fd3e84e0000 (LWP 301266)): warning: Section `.reg-xstate/301266' in core file too small.

0 0x00007fd49d60b73d in syscall () from /lib/x86_64-linux-gnu/libc.so.6

1 0x0000558df5f54725 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62

2 0x0000558df5f49b39 in std::sys_common::thread_parker::futex::Parker::park () at library/std/src/sys_common/thread_parker/futex.rs:52

3 std::thread::park () at library/std/src/thread/mod.rs:942

4 0x0000558df5f4fedf in std::sync::mpsc::blocking::WaitToken::wait () at library/std/src/sync/mpsc/blocking.rs:67

5 0x0000558df5e903f2 in std::sync::mpsc::shared::Packet::recv () at library/std/src/sys_common/thread_info.rs:28

6 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

--Type for more, q to quit, c to continue without paging-- Thread 3 (Thread 0x7fd3e82df000 (LWP 301267)): warning: Section `.reg-xstate/301267' in core file too small.

0 0x00007fd4f2080253 in bytehound::allocation_tracker::on_allocation () from /home/f/codes/bytehound/target/release/libbytehound.so

1 0x00007fd4f2070266 in malloc () from /home/f/codes/bytehound/target/release/libbytehound.so

2 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

Thread 2 (Thread 0x7fd3fdaff000 (LWP 301265)): warning: Section `.reg-xstate/301265' in core file too small.

0 0x00007fd4f209fd10 in core::str::pattern::TwoWaySearcher::next () from /home/f/codes/bytehound/target/release/libbytehound.so

1 0x00007fd4f209d1dd in ::next_match () from /home/f/codes/bytehound/target/release/libbytehound.so

2 0x00007fd4f2096efb in bytehound::smaps::update_smaps () from /home/f/codes/bytehound/target/release/libbytehound.so

3 0x00007fd4f20aefeb in bytehound::processing_thread::thread_main () from /home/f/codes/bytehound/target/release/libbytehound.so

4 0x00007fd4f2191b11 in nwind_ret_trampoline_start () from /home/f/codes/bytehound/target/release/libbytehound.so

Thread 1 (Thread 0x7fd3e7dff000 (LWP 304234)):

0 0x00007fd4f219144f in _rjem_mp_je_tsd_state_set () from /home/f/codes/bytehound/target/release/libbytehound.so

1 0x00007fd4f21915ac in _rjem_mp_je_tsd_cleanup () from /home/f/codes/bytehound/target/release/libbytehound.so

2 0x00007fd49d8415a1 in __nptl_deallocate_tsd.part.0 () from /lib/x86_64-linux-gnu/libpthread.so.0

3 0x00007fd49d84262a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0

4 0x00007fd49d612133 in clone () from /lib/x86_64-linux-gnu/libc.so.6

(gdb)

koute commented 1 year ago

Looks like it crashed when one of the threads in your program finishes and the jemalloc inside of Bytehound tries to deallocate its thread local storage.

Can you tell me more about your program?

  1. What does it do?
  2. Does it load any shared objects at runtime? (e.g. through dlopen)
  3. Do you use jemalloc?
freemanliu commented 1 year ago
  1. It use libtorch for machineline.
  2. No but not sure whethre libtorch does that.
  3. same as 2.

Question: why the thread local storage does not trigger segment fault withoiut bytehound?

koute commented 1 year ago

Would it be possible for you to create a minimal example using libtorch which does something similar to what your program's doing (it doesn't need to do anything useful; just trigger the same codepaths) and see if it'll crash with it too?

freemanliu commented 1 year ago

It will be hard because there are a lot of code and it takes long time to run. But it is not urgent. I can restart it once killed. Thanks for your help anyway. I like the idea of Bytehound.

Cheers, Freeman

On Wed, 12 Oct 2022, 20:43 Koute, @.***> wrote:

Would it be possible for you to create a minimal example using libtorch which does something similar to what your program's doing (it doesn't need to do anything useful; just trigger the same codepaths) and see if it'll crash with it too?

— Reply to this email directly, view it on GitHub https://github.com/koute/bytehound/issues/93#issuecomment-1275884588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJKX36HPIGHLM4Z7KPTG523WC2B4RANCNFSM6AAAAAARC2MS2I . You are receiving this because you authored the thread.Message ID: @.***>

koute commented 1 year ago

It will be hard because there are a lot of code and it takes long time to run.

Well, that's the point of a minimal reproduction. (: Try to cut down what the program's doing and see if it still reproduces. (If it's an ML algorithm then e.g. try to delete 99% of the training data, significantly reduce the network size, delete as much code as you can, etc.)