m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.17k stars 156 forks source link

Fix issue 370 #414

Open h33p opened 3 weeks ago

h33p commented 3 weeks ago

Closes #370

The issue gets triggered, because qemu/virsh generated elf dump contains strtab at the end of the dump file. Meanwhile, dumps are large, and thus, it is infeasible to provide the entire dump file to the parser. This PR relaxes the requirements of the parser and returns default strtab, if the initial size check fails.

h33p commented 1 week ago

@m4b thank you for your reply!

Secondly, unless I've misunderstood, if the issue is that the strtab is out of bounds, this means that the file is malformed, unless only a chunk of the file is being loaded and passed to goblin;

Exactly. just as in #370, only the head chunk is passed over to goblin. The file itself is not malformed.

In that case, probably the user should use lazy_parse , since strtab will get defaulted there.

This probably is the right way to go about this. My usecase is only reading the program headers, nothing more.

This means, I can just use ProgramHeader::parse, however, it requires a scroll Ctx. Is there a neat way to access/build that without copying the logic in parse_misc?

m4b commented 5 days ago

Ok, I see your issue. So, this is primarily because lazy_parse isn't really fleshed out in a consistent way. Iiuc, you would do:

let mut elf = Object::lazy_parse(bytes)?;
// how to populate program_headers?

So I think the best path forward here will be:

  1. exposing the ctx by making the field pub or a pub fn ctx() (I don't want to do this)
  2. adding some helper methods, especially for e.g., program_headers and section_headers, where it takes care of passing the various fiddly bits (i think this is ideal, but i'm not sure whether a builder-esque pattern, or mutating methods on the elf struct?

to flesh out 2. a bit, i mean something like:

pub fn parse_program_headers(&mut self, bytes: &[u8]) {
            let program_headers = ProgramHeader::parse(bytes, self.header.e_phoff as usize, self.header.e_phnum as usize, self.ctx)?;

            let mut interpreter = None;
            for ph in &program_headers {
                if ph.p_type == program_header::PT_INTERP && ph.p_filesz != 0 {
                    let count = (ph.p_filesz - 1) as usize;
                    let offset = ph.p_offset as usize;
                    interpreter = bytes.pread_with::<&str>(offset, ::scroll::ctx::StrCtx::Length(count)).ok();
                }
            }
      self.program_headers = program_headers;
      self.interpreter = interpreter;
}

Obviously this would forward to either private static methods that return the appropriate types, and so code duplication is reduced, or we use &mut self methods to construct/parse whichever components we want.

so there's a number of things I would change at this point, which is somewhat unrelated to your request to just parse the program headers:

  1. We should probably introduce a builder or a &mut self api
  2. lazy_parse should have taken some bytes and parsed the header, and defaulted the rest
  3. parse calls lazy_parse, and then uses the above builder api to construct the remaining structs

I know this isn't probably what you wanted to hear, and while I understand your patch may "fix" the issue for you, it isn't unfortunately the right solution to the problem, which is specifically, you want to lazy_parse, and then supply just e.g., the program_header bytes and parse from those, to get the information you need.

Fortunately, I don't think this is a particularly hard problem to fix in a better manner, as I sort of outlined above it's just slightly more work :) let me know if you have any questions, and thanks for your patience and explaining your usecase better!