DWARFless Debugging - Githubissues

brenns10 commented 2 years ago

Last updated: 2024-03-28

This issue tracks support for non-DWARF sources of debugging information: specifically for the Linux kernel, but hopefully including userspace as we go. I'm editing this initial issue comment as the project takes shape, so hopefully this can provide at-a-glance status information.

Overview

Drgn needs several kinds of information to understand and debug programs. It uses type information, symbols, "object finding", unwind info, file+line number mappings, and probably other kinds of info I missed. Currently, all of this information comes from DWARF debug information (except symbols, which are just from the ELF symbol table). Some of these kinds of informations have extensible APIs (such as object finders and type finders) but others, such as symbol tables, aren't extensible.

This issue tracks the work necessary to support non-DWARF sources of debugging information. While DWARF is the major contender in this areas, it has been criticized for being a bit "heavy" (large file sizes). As a result, the DWARF information is typically stripped from binaries in common Linux distributions, and commonly packaged in a separate package (e.g. "foo-debuginfo" for package "foo"). Sometimes, distributions offer debuginfod which is a way to serve the relevant debuginfo files only when necessary, without the need to install the debuginfo packages. Drgn has support for this (if built from source, not installed from pip), and the support will be improving soon. Debuginfod is great, but sadly not an option for many.

In some cases, it may not be possible to install DWARF debuginfo: maybe it was never generated in the first place, maybe it was stripped and not placed in a debuginfo package or debuginfod server, or maybe there's no internet access, lack of disk space, or a strict policy against installing debuginfo packages (yes, really, I've had to deal with that). Then, you might want to use a more compact format, such as the Compact Type Format (CTF), or BPF Type Format (BTF).

The Linux kernel frequently comes with BTF data built in (depending on config). Some linux distributions (Oracle Linux w/ UEK kernel) come with CTF data packaged in the normal kernel package. The Linux kernel also comes with a symbol table - kallsyms (again, depending on config). The Linux kernel also frequently uses either frame pointers, or a stack unwinding information format called ORC, for stack unwinding. If all these pieces could be combined, then most of the features of drgn could be usable without the DWARF information. That's what this issue is about.

Objectives

The currently agreed upon end goal here is to get a pluggable symbol finder and vmlinux kallsyms implementation merged and available by default for Drgn. These are useful in and of themselves:

Even without type info, a vmlinux kallsyms implementation would allow people to lookup symbols, a major improvement over the current state of things (in which only Program.read* functionality would be available).
It's relatively common to have vmlinux type info, but not module info. A pluggable system would allow a module symbol implementation that uses the module kallsyms and module exported symbols.

Once these are available, we will get a basic CTF implementation for Linux kernel merged. The basic ground rules here are that it will be disabled at compile-time in the PyPI wheel distributions, and won't muddle into the internals of Drgn. Essentially, there should be some build-system related changes, and a file named ctf.c, and maybe some python wrappers, and that's it.

Here are some non-goals at the moment. They may be revisited in time.

CTF support for userspace programs. Any program compiled with -gctf has a .ctf section. Compare that to the kernel implementations, which now just create a vmlinux.ctfa (ctfa = CTF Archive) file. As of now, the CTF implementation does support very simple userspace cores in order to add simple unit tests. However, proper support will need better integration with the drgn debug info system, so it won't be officially supported for the initial step.
Automatic detection and use of CTF data. This would require tangling the CTF implementation into the core a bit more than we want to do initially. Also, it might conflict with some of the work on the Module API.
Enabling CTF support in PyPI wheels.

Roadmap

[x] First pull request: #316 Add VMCOREINFO to the Linux special objects. This is not controversial, it's just a nice piece of information to expose to Python helpers. The current design of my helpers does need this, but even if it didn't, it would be a useful piece.
[x] Second pull request: #241 Pluggable Symbol finder API. Allows C and Python to register "symbol finders". In review
[ ] Third pull request: #388 Add symbol finders for kallsyms (vmlinux & module)
[ ] Fourth pull request: Adding CTF implementation. (see ctf branch).
[ ] Possible fifth pull request: Support ORC unwinding without ELF files. (see ctf branch).

Current branches

These are links to branches that contain my current work, and they do roughly correspond to different points in the roadmap above. They are stacked, each one building on the prior one. They are subject to being rebased and force pushed at any time.

~symbol_finder - this branch adds the pluggable symbol finder API.~
kallsyms_finder - this branch adds the kallsyms implementation
ctf - this branch adds the CTF implementation. It also has the necessary plumbing to use ORC for unwinding, without needing to read it from ELF debuginfo files.
btf_2024 - this branch adds a small BTF implementation. Due to the (current) nature of BTF on Linux, type definitions are provided only for functions and some percpu variables. The branch has some workaround helpers for this, but it will be some separate work to add variable definitions into BTF in the upstream Linux kernel.

If you take the latest branch (ctf) on an Oracle Linux 9 machine using UEK, then you should be able to build it and install it against a local kernel without installing any debuginfo packages!

sudo dnf config-manager --enable ol9_developer_EPEL
sudo dnf config-manager --enable ol9_developer
sudo dnf config-manager --enable ol9_appstream
sudo dnf config-manager --enable ol9_codeready_builder

sudo dnf install -y make autoconf automake libtool gcc-c++ git \
                    python3-devel elfutils-devel binutils-devel \
                    bzip2-devel zlib-devel xz-devel \
                    libkdumpfile-devel

cd drgn
python setup.py build_ext -i
sudo python -m drgn
# tada

Future Work

BTF: I have a recently updated branch with some minimal BTF support. It could use some improvement. I'm also considering using libbtf in the future, rather than hand-coding the format support.
Userspace CTF support needs some improvement. It exists in the current CTF branch in a very limited way (which helps in unit testing).

Old Branches & Work

I have created a few prototype branches on older drgn versions. Only the ones mentioned above are actively maintained and developed. The ones below are older and no longer maintained. For the most part, the commits in these branches were used as the basis for more recent branches, so it's not like the work is lost. The below list is from oldest to newest.

btf_debuginfo - this was my original attempt at alternative debuginfo formats. It didn't have any symbol table support. I submitted #162 for this, and closed it as I worked on newer code.
kallsyms_plus_btf - builds on the above with kallsyms support
kallsyms_vmlinux - this contains the kallsyms implementation, without any BTF code. This went into its own pull request (#177) which I am about to close as well since the branch is outdated.
kallsyms_ctf - this is the old kallsyms implementation, with a preliminary, very hacky CTF implementation.