oracle / dtrace-utils

DTrace-utils contains the DTrace port to Linux
Other
133 stars 19 forks source link

Using debuginfo for better backtraces #96

Open thesamesam opened 3 weeks ago

thesamesam commented 3 weeks ago

This is maybe a better example of the kind of thing I was talking about in https://github.com/oracle/dtrace-utils/issues/84.

With splitdebug (-ggdb3 but in /usr/lib/debug and stripped less), ustack() output is not super friendly:

$ sudo dtrace -n 'syscall::fsync*:return,syscall::sync*:return { ustack(); }'
[...]
  6 119674                     fsync:return
              libc.so.6`fsync+0x10
              less`0x5ea421eafd4d
              0x5ea421eb85bd
              0x5ea421eafa30
              0x7796ac9e5407
              0x7ffc1f5cecc3
              0x2f65686361632f72

In this case, I genuinely didn't know that less would ever call fsync, so I was curious as to where from! But the backtrace isn't so helpful there.

I get better output if I disable stripping and use -fno-omit-frame-pointer:

$ sudo dtrace -n 'syscall::fsync*:return,syscall::sync*:return { ustack(); }'
 25 119674                     fsync:return
              libc.so.6`fsync+0x10
              less`quit+0x5d
              less`commands+0x83d
              0x59897f425a30
              0x78803df45407
              0x7ffd79035cc3
              0x2f65686361632f72

It's not perfect, but it's more than enough for me to pin down what's going on.

Could DTrace learn to read DWARF (elfutils should be able to do this, including understanding splitdebug and so on) for backtraces?

kvanhees commented 2 weeks ago

We certainly can look at it being an optional support - if debuginfo is available it would make sense to make use of it if it does not negatively impact trace processing. Anything that improves backtraces while not adding to the runtime dependencies in general is good.

nickalcock commented 2 weeks ago

There are two distinct issues here: DTrace wants backtrace info for reliable stack traces (which has to be something the kernel can understand --hopefully, in the future, sframe will do here), and DTrace's userspace wants a symbol table for symbol lookups. Even the latter is only going to work for longer-running traces where the process hasn't already died before userspace gets its hands on the trace, but even then this is troublesome for main programs which are routinely stripped. Solaris implemented an .ldynsym section for just this, but the Linux approach seems to have been quite different: a section containing a compressed ELF executable (!!) which only has symbol table sections in it. We do not yet handle this crazy thing, and in my last trials relatively few binaries were built with it at all. We do need a symtab from somewhere.

I'd be happy to add some sort of symbol server support, but I don't think Linux has any such thing either...

thesamesam commented 2 weeks ago

a section containing a compressed ELF executable

I'm pretty sure this is MiniDebugInfo (.gnu_debugdata). It looks like only Fedora ships with it by default (?) but I'd be open to us doing it in Gentoo.

One question is if we want to try lead some standardisation of making it a proper compressed section or not. But that would delay things substantially.

I'd be happy to add some sort of symbol server support, but I don't think Linux has any such thing either...

Isn't that debuginfod? What am I missing?