ut-parla / Parla.py

A Python based programming system for heterogeneous computing
Other
21 stars 9 forks source link

Debug Symbols in VECs #74

Open insertinterestingnamehere opened 3 years ago

insertinterestingnamehere commented 3 years ago

This is somewhat aspirational, but currently there aren't any debug symbols available in VECs. No backtrace library gives meaningful results and gdb doesn't work. This makes debugging a nastier business than it already would be for something so unorthodox as VECs. We need to find some way to actually be able to have and use debug symbols in VECs. This will likely require a decent bit of hacking on the underlying libraries/tools, but it also seems like the kind of thing that people would really appreciate upstream. We're definitely not the first people to run into dlmopen's linker namespaces not working with gdb.

insertinterestingnamehere commented 3 years ago

See https://stackoverflow.com/questions/51592455/debugging-strategies-for-libraries-open-with-dlmopen.

insertinterestingnamehere commented 3 years ago

The best we've been able to do as of this writing is attempt to print out the stack with libunwind. See https://github.com/ut-parla/Parla.py/blob/d82e6d573a5cd5b1e492a99a0106cd90e632961e/parla/vec_backtrace.h#L13. When exactly it works is inconsistent, and symbols may be left out if libunwind can't find the debug info. Even getting our libunwind solution to work reliably would already be a win in this case. Full GDB support may be a time-consuming undertaking. I don't know.

insertinterestingnamehere commented 3 years ago

Specifically worth noting: I think the libunwind trace printing has succeeded in finding symbols for some functions within a VEC. For example, the segfaults in https://github.com/ut-parla/Parla.py/issues/68 include what appear to be mostly full traces. On the other hand we've seen the libunwind stack tracing outright skip printing function names or only print out a small portion of the stack. I'm not clear on why this happens. Perhaps the debug symbols are available somewhere, but we need a more reliable way of accessing them? glibc's own backtrace functions choke on linker namespaces though.

arthurp commented 3 years ago

On the other hand we've seen the libunwind stack tracing outright skip printing function names or only print out a small portion of the stack.

I remember similar behavior when gdb (attached to the running process). So I don't think this is just a libunwind thing. I wonder if there is a debug table that is shared between namespaces so in some cases the table contains the right things for a given VEC, but not in other cases. And the table search may assume that the entry is present in the table, so that it fails in unpredictable ways when the entry isn't present (because it was overwritten by another VEC maybe).

insertinterestingnamehere commented 3 years ago

I think the breaks in the traces usually correspond to places where a function in one VEC calls a function in another. Maybe the issue is jumping between tables?