Open freemanliu opened 1 year ago
The first step should be enabling core dumps, getting it to dump one, and then open it in GDB and see where exactly it crashed.
Hi, Koute,
Thanks for your quick response. Here are the stack trace of all the threads. Seems the clone caused the problem(thread 1). Can you please take a look? Let me know if further info is needed.
Thread 9 (Thread 0x7fd3e5a7d000 (LWP 301271)): warning: Section `.reg-xstate/301271' in core file too small.
Thread 8 (Thread 0x7fd3e627e000 (LWP 301270)): warning: Section `.reg-xstate/301270' in core file too small.
Thread 7 (Thread 0x7fd3f5030000 (LWP 306248)): warning: Section `.reg-xstate/306248' in core file too small.
Thread 6 (Thread 0x7fd47e0e9000 (LWP 301264)): warning: Section `.reg-xstate/301264' in core file too small.
Thread 5 (Thread 0x7fd3e6a7f000 (LWP 301269)): warning: Section `.reg-xstate/301269' in core file too small.
Thread 4 (Thread 0x7fd3e84e0000 (LWP 301266)): warning: Section `.reg-xstate/301266' in core file too small.
--Type
Thread 2 (Thread 0x7fd3fdaff000 (LWP 301265)): warning: Section `.reg-xstate/301265' in core file too small.
Thread 1 (Thread 0x7fd3e7dff000 (LWP 304234)):
(gdb)
Looks like it crashed when one of the threads in your program finishes and the jemalloc inside of Bytehound tries to deallocate its thread local storage.
Can you tell me more about your program?
dlopen
)Question: why the thread local storage does not trigger segment fault withoiut bytehound?
Would it be possible for you to create a minimal example using libtorch which does something similar to what your program's doing (it doesn't need to do anything useful; just trigger the same codepaths) and see if it'll crash with it too?
It will be hard because there are a lot of code and it takes long time to run. But it is not urgent. I can restart it once killed. Thanks for your help anyway. I like the idea of Bytehound.
Cheers, Freeman
On Wed, 12 Oct 2022, 20:43 Koute, @.***> wrote:
Would it be possible for you to create a minimal example using libtorch which does something similar to what your program's doing (it doesn't need to do anything useful; just trigger the same codepaths) and see if it'll crash with it too?
— Reply to this email directly, view it on GitHub https://github.com/koute/bytehound/issues/93#issuecomment-1275884588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJKX36HPIGHLM4Z7KPTG523WC2B4RANCNFSM6AAAAAARC2MS2I . You are receiving this because you authored the thread.Message ID: @.***>
It will be hard because there are a lot of code and it takes long time to run.
Well, that's the point of a minimal reproduction. (: Try to cut down what the program's doing and see if it still reproduces. (If it's an ML algorithm then e.g. try to delete 99% of the training data, significantly reduce the network size, delete as much code as you can, etc.)
Verson of bytehound-preload is "0.10.0". Happens for both debug of release build of bytehound-preload.
How to debug this?