m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.17k stars 156 forks source link

COFF stringtable created with 'strings' function truncates the first 4 characters when used with symbol offset #373

Open DasStone opened 1 year ago

DasStone commented 1 year ago

Detailed Problem Description with Example

A symbol with a name longer than 8 characters, will be stored in the string table, as per the PE File Format Documentation given by Microsoft. The offset to the string table can then be determined by the last 4 bytes of the 8 byte long name field if the first 4 bytes are 0 (see SymbolNameRepresentation).

This does not seem to work correctly in goblin. Take a look at this example:

// read file to u8 buffer
let buffer = fs::read(path_pe).unwrap();

let pe = pe::PE::parse(&buffer).unwrap_or_else(|e| {
    println!("err: {:?}", e);
    process::exit(2);
});

let symbol_count = pe.header.coff_header.number_of_symbol_table as usize;
let symbols = pe.header.coff_header.symbols(&buffer).unwrap();
let strtab = pe.header.coff_header.strings(&buffer).unwrap();

// search symbol fibonacci
for i in 0..symbol_count {
    let named_sym = symbols.get(i).unwrap();

    let name = match named_sym {
        (Some(name), _) => name,
        (None, sym) => {
            let strtab_offset = u32::from_le_bytes(sym.name[4..8].try_into().unwrap());
            strtab.get_at(strtab_offset as usize).unwrap()
        },
    };

    if name == "fibonacci" {
        println!("Correct Name");
        println!("{:#?}", named_sym.1);
        println!("Calculated strtab offset: {}", u32::from_le_bytes(named_sym.1.name[4..8].try_into().unwrap()));
    } else if name == "nacci" {
        println!("Name offset by 4 characters");
        println!("{:#?}", named_sym.1);
        println!("Calculated strtab offset: {}", u32::from_le_bytes(named_sym.1.name[4..8].try_into().unwrap()));
    }
}

The provided PE file contains a function called "fibonacci". This can easily be confirmed with objdump -t <file> | grep fibonacci.

[ 88](sec  1)(fl 0x00)(ty   20)(scl   2) (nx 0) 0x0000000000001670 fibonacci

However when executing the provided Rust code on the same file, the output will stem from the second case (name == "nacci"). Here is the exact output:

Name offset by 4 characters
Symbol {
    name: [
        0,
        0,
        0,
        0,
        78,
        4,
        0,
        0,
    ],
    value: 5744,
    section_number: 1,
    typ: 32,
    storage_class: 2,
    number_of_aux_symbols: 0,
}
Calculated strtab offset: 1102

This is also the same entry that was previously found by objdump.

Short Problem Description

It seems, that the strtab parsed by the strings function on a CoffHeader accidentally offsets every string by 4 characters.

Expected Output

I would have expected to receive the full name with the provided Rust code. Not a truncated Version.

I hope that I didn't do anything wrong when parsing the necessary information...

Other Information:

goblin-version: 0.7.1 rustc-version and cargo-version: 1.70.0 OS: Windows 11 Pro Version 22H2

m4b commented 1 year ago

Hmmm, thank you for the detailed writeup! Can you paste the binary by any chance with your fibonnaci function? I'm a little surprised this wouldn't have come up earlier, but it is possibly a regression since I know a change went in recently with pe and string table

DasStone commented 1 year ago

Yes of course!

Here is the source code if you want to compile it yourself (but the binary is also attached, see below):

fn main() {
    println!("Here are some fibonacci numbers");
    println!("Fib 50: {}", fibonacci(50));
}

#[no_mangle]
fn fibonacci(n: u64) -> u64 {
    println!("addr of fib: {:#?}", fibonacci as *const u8);
    let mut a = 0;
    let mut b = 1;

    if n == 0 {
        return 0;
    }

    for _ in 1..n {
        let tmp = a + b;
        a = b;
        b = tmp;
    }

    b
}

(The program was called hash-test, but I stripped it of any meaningful computation. All that is left is a main function that calls a fibonacci calculation) hash-test.zip

m4b commented 1 year ago

if you git revert eaba4ed4ae6a8054fd489dd011aeaf8609378b48 can you test if this repro's for you?

DasStone commented 1 year ago

Sadly not, undoing the changes from the specified commit does not resolve the issue. I ran the code I provided again (ofc. with the changed version of goblin), and the results are still the same. The name is still offset by 4 characters.

m4b commented 1 year ago

hmmm, if i run bingrep on your exe, all the libraries functions it imports seem ok, so i don't think this is a general problem in our strtab parsing itself.

In your example code could you try using the name_offset() api on Symbol? e.g. change this line fromlet strtab_offset = u32::from_le_bytes(sym.name[4..8].try_into().unwrap()); to use let stratb_offset = sym.name_offset().unwrap()

m4b commented 1 year ago

I suspect you needed this portion: https://github.com//m4b/goblin/blob/17a5c7cc992220cd72e349eddea99148f31c5e65/src/pe/symbol.rs#L240

DasStone commented 1 year ago

In your example code could you try using the name_offset() api on Symbol? e.g. change this line fromlet strtab_offset = u32::from_le_bytes(sym.name[4..8].try_into().unwrap()); to use let stratb_offset = sym.name_offset().unwrap()

I needed to change your suggested fix a bit (since the unwrap() would not work in the current context. But this is no meaningful change):

let strtab_offset = match sym.name_offset() {
                    Some(val) => val,
                    None => continue,
                };

The Problem still persists when testing the "fixed" code with my local version of goblin (the one with git revert eaba4ed4ae6a8054fd489dd011aeaf8609378b48 ).


I also checked how the code would behave with the current release of goblin 0.7.1. It crashes with attempt to subtract with overflow on the exact line you already suggested (src\pe\symbol.rs:240:36).


Running in release seems to "fix" the issue for version 0.7.1

Matching the returned Option from get_at() and just continuing on a None value "resolves" the issue.

let name = match named_sym {
            (Some(name), _) => name,
            (None, sym) => {
                let strtab_offset = match sym.name_offset() {
                    Some(val) => val,
                    None => continue,
                };
                match strtab.get_at(strtab_offset as usize) {
                    Some(name) => name,
                    None => continue,
                }
            }
        };

But this "fix" seems highly suspicious to me. First of all, there should not be a difference when running debug and release (regarding the code in question). Second, I just printed all symbols found by this code and there are some symbols that can't be found with objdump -t <.exe> | grep <sym_name>. I assume this might be due to the overflowing subtraction.

Based on this, I still think that there might be some deeply rooted problem within goblins PE module...

philipc commented 1 year ago

for i in 0..symbol_count {

This is wrong because it will try to parse auxiliary symbols as regular symbols. Use symbols.iter() instead.

First of all, there should not be a difference when running debug and release

Agreed. name_offset should use checked_sub instead so that we gracefully handle malformed files, but once you fix the iteration it won't matter for the file you are testing on.

DasStone commented 1 year ago

This is wrong because it will try to parse auxiliary symbols as regular symbols. Use symbols.iter() instead.

True. It seems to be working now.

Thank you for your help @m4b and @philipc !

(I will leave closing the issue up to you, due to the checked_sub code change)