m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.17k stars 156 forks source link

Should strtab parsing jump over contigous blocks of `\0` #384

Open wolfv opened 7 months ago

wolfv commented 7 months ago

I am wondering if adding additional \0 in the strtab is valid (we want to override the rpath of a shared library by replacing the string with a shorter, \0 padded one).

I looked at the code for strtab parsing I noticed that it creates empty strings if it finds multiple consecutive \0, e.g. this test fails:

#[cfg(test)]
mod test {
    use goblin::strtab::Strtab;

    #[test]
    fn test_strtab() {
        let data = "hello\0\0\0\0world\0\0".as_bytes();
        let strtab = Strtab::parse(&data, 0, data.len(), 0).unwrap();
        assert_eq!(strtab.to_vec().unwrap(), vec!["hello", "world"]);
    }
}

with

  left: ["hello", "", "", "", "world", ""]
 right: ["hello", "world"]

Wouldn't it be more expected that the parser jumps over blocks of \0? I am expecting the parsing with offsets to work as expected.

philipc commented 7 months ago

I am wondering if adding additional \0 in the strtab is valid

Yes. Values in the strtab only have meaning if something references them.

we want to override the rpath of a shared library by replacing the string with a shorter, \0 padded one

Be aware that a string and its suffixes can be referenced from multiple places, so by modifying the string for rpath it is possible to be changing another string at the same time. This may be unlikely for the complete rpath string, but it's hard to tell when the suffix will have meaning somewhere else too.

Wouldn't it be more expected that the parser jumps over blocks of \0?

If something actually references the \0, then the parser should return an empty string. If the parser jumped over it, then I think Strtab::get_at would return None, which would be wrong. Compilers won't generate this sort of thing in practice, but it would be best to handle it correctly if it does happen.

As an aside, the way that goblin parses string tables into a list of strings is unusual. Most parsers don't do that.

wolfv commented 7 months ago

Hi @philipc thanks for your insights! Am I correct in the assumption that the strings for rpath are in the .dynamic part, and in the dynstrtab table? So if I would check the entire section for suffixes I could rule out that any other string is corrupted by the edits?

philipc commented 7 months ago

Yes the rpath reference is in .dynamic. The strings in the .dynstr section can be referenced by .dynamic, .dynsym, and the GNU version information sections.