volatilityfoundation / dwarf2json

convert ELF/DWARF symbol and type information into vol3's intermediate JSON
Other
95 stars 28 forks source link

Handle Rust bindings in the Linux kernel #63

Open Abyss-W4tcher opened 3 months ago

Abyss-W4tcher commented 3 months ago

Hi,

while investigating #57, I noticed the issue started appearing around the integration of Rust in the Linux kernel. With a bit more debugging, I was able to confirm that some bindings were being processed by dwarf2json in the same pool as C structs names :

$ llvm-dwarfdump vmlinux-6.5.0-14-generic --name fs_struct --show-children | grep 'DW_TAG_member' -A 2
0x063c0c3e:   DW_TAG_member
                DW_AT_name      ("_unused")
                DW_AT_type      (0x063ca2a5 "u8[0]")
--
0x063d404b:   DW_TAG_member
                DW_AT_name      ("_unused")
                DW_AT_type      (0x063d52f5 "u8[0]")

$ llvm-dwarfdump vmlinux-6.5.0-14-generic --debug-info=0x063c0c3e --show-parents
vmlinux-6.5.0-14-generic:       file format elf64-x86-64

.debug_info contents:

0x063b59a4: DW_TAG_compile_unit
              DW_AT_producer    ("clang LLVM (rustc version 1.68.2 (9eb3afe9e 2023-03-27) (built from a source tarball))")
              DW_AT_language    (DW_LANG_Rust)
              DW_AT_name        ("/build/linux-SXblTa/linux-6.5.0/rust/bindings/lib.rs/@/bindings.04c8d523-cgu.0")
              DW_AT_stmt_list   (0x00d8f3e1)
              DW_AT_comp_dir    ("/build/linux-SXblTa/linux-6.5.0/debian/build/build-generic")
              DW_AT_GNU_pubnames        (true)
              DW_AT_low_pc      (0xffffffff818060b0)
              DW_AT_high_pc     (0xffffffff81808cf0)

0x063be675:   DW_TAG_namespace
                DW_AT_name      ("bindings")

0x063be67a:     DW_TAG_namespace
                  DW_AT_name    ("bindings_raw")

0x063c0c37:       DW_TAG_structure_type
                    DW_AT_name  ("fs_struct")
                    DW_AT_byte_size     (0x00)
                    DW_AT_alignment     (1)

0x063c0c3e:         DW_TAG_member
                      DW_AT_name        ("_unused")
                      DW_AT_type        (0x063ca2a5 "u8[0]")
                      DW_AT_alignment   (1)
                      DW_AT_data_member_location        (0x00)

Should these bindings, or wider all rust content, be processed separately from the regular structures ? I don't think they should be discarded, but maybe stored under a different parent key in the ISF ?

Abyss-W4tcher commented 1 month ago

Hello, has anyone had a chance to look into a solution ?

Unfortunately, all ISFs generated after Linux kernel 6.5 are currently invalid. :/

ikelos commented 2 weeks ago

Anyone here got any progress on this? If changes need making to the main symbol table format, that's possible but I don't fully understand what these new structures are or how they relate yet, so hopefully someone can give me a run down so we can figure out a way to sort them appropriately...

Abyss-W4tcher commented 2 weeks ago

The Ubuntu (Linux) kernel includes Rust bindings for existing C APIs. It is possible to check them by looking at a sample source code : https://bugs.launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/26995753/+files/linux-lib-rust-6.5.0-14-generic_6.5.0-14.14_amd64.deb.

Related to this issue, we can check out the fs_struct binding (usr/src/linux-lib-rust-6.5.0-14-generic/rust/bindings/bindings_generated.rs) :

#[repr(C)]
#[derive(Copy, Clone)]
pub struct fs_struct {
    _unused: [u8; 0],
}

The problem is that we now have two fs_struct structs inside the vmlinux DWARF information. However, one is the "classical" C struct and the other one a Rust binding. My guess is that dwarf2json processes everything directly, instead of iterating over DW_TAG_compile_unit (check the first comment of this issue).

To avoid breaking completely the existing ISF format, we could prefix every extracted rust binding/data with something like rust., resulting in :

edit : There might be confusions with cross references, so not a relevant idea (except if handled correctly ?). Maybe storing all rust content inside additional keys might be required (rust_symbols, rust_types...), but this also breaks with Volatility.

ikelos commented 2 weeks ago

That seems reasonable if it becomes a unique namespace (which it sounds like rust. or <language>. would. Anyone any idea how much effort will that be to add to dwarf2json?

mkonshie commented 1 week ago

Sorry for the delay on this, I was able to discuss this with the dwarf2json maintainers.

This issue and discussion has been about the conflict between Rust and C types. However, we believe that a conflict between Rust and C symbols is also possible. We think modifying the current schema is probably the best way to avoid these collisions between Rust and C types instead of adding a prefix to the type names. For example, the new top-level schema could look close to this:

{
    metadata: {},
    base_types: {},
    base_types_rust: {},
    user_types: {},
    user_types_rust: {},
}

Can this new schema work with volatility3? or will changes need to be made there as well?

Separating the user types and the base types should be straight forward, but separating C symbols and Rust symbols will be more complex. This is because symbols can come from different sources like system.map, DWARF, and the symbol table and whether Rust and C symbols will collide depends on the input source.

I'm currently looking into addressing this, but it will take some time. In the meantime, a solution could be to skip rust compilation units all together to avoid the collision and then add them back after deciding on a solution.

Abyss-W4tcher commented 1 week ago

Hello, looking at a sample System.map, there is no way to tell with precision from which compile unit a symbol originates. Even if some of them are conveniently prefixed with rust_ :

ffffffff818095c0 T rust_fmt_argument

Many cannot be determined precisely :

ffffffff81809570 T _RNvXs0_NvNtNtCsbwHtcUjRN57_6kernel4sync7condvar1__NtB7_7CondVarNtNtNtBb_4init10___internal10HasPinData10___pin_data

Those are exported explicitely in the Ubuntu rust bindings :

EXPORT_SYMBOL_RUST_GPL(rust_fmt_argument);
EXPORT_SYMBOL_RUST_GPL(_RNvXs0_NvNtNtCsbwHtcUjRN57_6kernel4sync7condvar1__NtB7_7CondVarNtNtNtBb_4init10___internal10HasPinData10___pin_data);

However, when exported through EXPORT_SYMBOL_RUST_GPL, I noticed that these "rust" symbols were labeled under the "GNU C11" compile unit in the vmlinux, so in the same pool as regular C symbols. So, in fact, the symbols in System.map aren't designed to be "language" labeled by nature.


FYI, PR https://github.com/volatilityfoundation/dwarf2json/pull/65 makes use of namespace prefixes, which allows to keep the existing schema while resolving conflicts and separating types and symbols. Of course, it is open for reviews :) .

edit : Even without Rust support, there are some symbols existing multiple times in the same System.map/symbols list (see https://patchwork.kernel.org/project/linux-kbuild/patch/20230714150326.1152359-1-alessandro.carminati@gmail.com/). It can be checked out with awk '{print $3}' System.map | sort | uniq -d.