m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.19k stars 158 forks source link

elf: The strtab in the section header is wrong. #300

Closed n01e0 closed 2 years ago

n01e0 commented 2 years ago

example

$ cat src/main.rs
use std::fs::File;
use std::io::Read;
use std::env;
use goblin::elf::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
  let args = env::args().collect::<Vec<String>>();
  if args.len() < 2 {
      eprintln!("Usage: {} binary", args[0])
  } else {
      let mut f = File::open(&args[1])?;
      let mut buf = Vec::new();
      f.read_to_end(&mut buf)?;
      let elf = Elf::parse(&buf)?;
      let strtab = elf.shdr_strtab.to_vec().unwrap();
      for s in strtab {
          println!("{}", s)
      }
  }

  Ok(())
}
$ cargo r `which ls`
~ snip ~
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/readstrtab /usr/bin/ls`

.shstrtab
.interp
.note.gnu.property
.note.gnu.build-id
.note.ABI-tag
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt.got
.plt.sec
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.data.rel.ro
.dynamic
.data
.bss
.gnu_debugaltlink
.gnu_debuglink
$ readelf -S `which ls`|grep "\["
  [Nr] Name              Type             Address           Offset
  [ 0]                   NULL             0000000000000000  00000000
  [ 1] .interp           PROGBITS         0000000000000318  00000318
  [ 2] .note.gnu.pr[...] NOTE             0000000000000338  00000338
  [ 3] .note.gnu.bu[...] NOTE             0000000000000358  00000358
  [ 4] .note.ABI-tag     NOTE             000000000000037c  0000037c
  [ 5] .gnu.hash         GNU_HASH         00000000000003a0  000003a0
  [ 6] .dynsym           DYNSYM           0000000000000450  00000450
  [ 7] .dynstr           STRTAB           0000000000001050  00001050
  [ 8] .gnu.version      VERSYM           0000000000001616  00001616
  [ 9] .gnu.version_r    VERNEED          0000000000001718  00001718
  [10] .rela.dyn         RELA             00000000000017b8  000017b8
  [11] .rela.plt         RELA             0000000000002bf8  00002bf8
  [12] .init             PROGBITS         0000000000004000  00004000
  [13] .plt              PROGBITS         0000000000004020  00004020
  [14] .plt.got          PROGBITS         00000000000046c0  000046c0
  [15] .plt.sec          PROGBITS         00000000000046f0  000046f0
  [16] .text             PROGBITS         0000000000004d80  00004d80
  [17] .fini             PROGBITS         0000000000018bc4  00018bc4
  [18] .rodata           PROGBITS         0000000000019000  00019000
  [19] .eh_frame_hdr     PROGBITS         000000000001e324  0001e324
  [20] .eh_frame         PROGBITS         000000000001ec70  0001ec70
  [21] .init_array       INIT_ARRAY       0000000000022fd0  00021fd0
  [22] .fini_array       FINI_ARRAY       0000000000022fd8  00021fd8
  [23] .data.rel.ro      PROGBITS         0000000000022fe0  00021fe0
  [24] .dynamic          DYNAMIC          0000000000023a58  00022a58
  [25] .got              PROGBITS         0000000000023c58  00022c58
  [26] .data             PROGBITS         0000000000024000  00023000
  [27] .bss              NOBITS           0000000000024280  00023268
  [28] .gnu_debugaltlink PROGBITS         0000000000000000  00023268
  [29] .gnu_debuglink    PROGBITS         0000000000000000  000232b4
  [30] .shstrtab         STRTAB           0000000000000000  000232e8

.pltdoesn't appear in shdr_strtab.

philipc commented 2 years ago

This is expected because strings can be any suffix. That is, .plt is a suffix of .rela.plt. Why do you want to use to_vec? I don't think it is a useful function.

n01e0 commented 2 years ago

This is expected because strings can be any suffix. That is, .plt is a suffix of .rela.plt.

Yes, sir. That's right. However, I don't think StrTab::parse takes this into account.

Why do you want to use to_vec? I don't think it is a useful function.

For example, there is no way in current goblin to correctly parse the StrTab in the section header when you want to do the readelf -S equivalent as shown here. So, in #299 , I plan to return the bytes of the StrTab as a Vec and provide a way to parse it by myself.

Also, I would like to fix the parsing of Elf.shdr_strtab in the future, is this a wrong idea?

philipc commented 2 years ago

StrTab::parse records the offset of the string so that it can be taken into account in get_at using a binary search.

The equivalent of readelf -S is to iterate over the section headers, and then use strtab.get_at(section.sh_name). There is never a good reason to iterate over individual strings in a StrTab in my opinion.

Edit: like this

      let elf = Elf::parse(&buf)?;
      for section in elf.section_headers {
          println!("{}", elf.shdr_strtab.get_at(section.sh_name).unwrap());
      }

Also, I would like to fix the parsing of Elf.shdr_strtab in the future, is this a wrong idea?

What needs fixing?

n01e0 commented 2 years ago

Wow I was not aware of this method. This may indeed be sufficient.

n01e0 commented 2 years ago

It's a small thing, but how about specify in the documentation that "StrTab::to_vec should not be used for shdr_strtab"? It's natural that a user who wants shdr_strtab in Vec will try to use to_vec, but it's not easy that he will find that .plt is not displayed there. I think an explicit warning would reduce the number of people repeating mistakes like mine in the future.

m4b commented 2 years ago

When I first saw your issue I was puzzled at first too; perhaps a documentation blurb on to_vec is a good idea? or perhaps showing iteration as a doc comment code example?

n01e0 commented 2 years ago

How about this?

    #[cfg(feature = "alloc")]
    /// Converts the string table to a vector of parsed strings.
    ///
    /// Note: If you want to get the correct `Vec` of shdr_strtab, don't use this method but use `get_at` to get it sequentially.
    ///
    /// Requires `feature = "alloc"`
    pub fn to_vec(&self) -> error::Result<Vec<&'a str>> {
m4b commented 2 years ago

Yea something like that sounds good to me; or even pasting philipc's code snippet as a ``rust block? (of course the snippet will need some massaging to get working incargo test`, so maybe just a simple example with manufactured strtable + symbol addresses as an array that one iterates over?)

m4b commented 2 years ago

anyway, PR's welcome :)

philipc commented 2 years ago

This problem isn't specific to shdr_strtab; it applies to every strtab. I don't know why to_vec even exists. As I've said, I don't think it is a useful function. Maybe deprecate it instead.

m4b commented 2 years ago

I don't know why to_vec even exists. As I've said, I don't think it is a useful function. Maybe deprecate it instead.

There is no way to access the contents of the parsed string table without to_vec or something that returns an iterator over the parsed values. Someone might want this, for e.g, displaying what's in the string table without a reference index.

E.g., I use it right here: https://github.com//m4b/bingrep/blob/7a2e2a157ada79afe51b46396347ac706d78a029/src/format_elf.rs#L380

:shrug:

m4b commented 2 years ago

It might be better to remove/deprecate it and just return an impl Iterator<Item = &'a str> since this wouldn't require the alloc flag

m4b commented 2 years ago

haha, funny enough, in the link i gave, that's actually the to_vec method of Symtab ;) so i guess i don't use it in bingrep, at least, but the point still remains, there's no way to access the parsed strings inside the string table without such a method.

n01e0 commented 2 years ago

This problem isn't specific to shdr_strtab; it applies to every strtab. I don't know why to_vec even exists. As I've said, I don't think it is a useful function. Maybe deprecate it instead.

How about creating an appropriate to_vec method for each StrTab? For example

    impl<'a> Elf<'a> {
        pub fn shdr_strtab_to_vec(&self) -> Vec<&str> {
            let mut ret = Vec::new();
            for shdr in &self.section_headers {
                ret.push(self.shdr_strtab.get_at(shdr.sh_name).unwrap_or(""));
            }
            ret
        }
philipc commented 2 years ago

there's no way to access the parsed strings inside the string table without such a method.

Why would you want to do that instead of using get_at? Ok, maybe for a niche case like bingrep, but most people using this crate aren't writing their own bingrep.

How about creating an appropriate to_vec method for each StrTab?

You very rarely want a list of section header names that are disassociated from the section headers. Typically you only want to process the name in conjunction with other information from the section header. Additionally, nothing says that shdr_strtab has to be separate from the string table for symbols, and some tools do combine them.

m4b commented 2 years ago

Closing this as I think this is resolved, please re-open if not the case