google / bloaty

Bloaty: a size profiler for binaries
Apache License 2.0
4.71k stars 339 forks source link

Direction for PE file support #220

Open learn-more opened 3 years ago

learn-more commented 3 years ago

To add support for PE files there are a few different approaches that can be used:

What would be the preferred way of moving forward?

haberman commented 3 years ago

Thanks for your interest in adding PE support! This is something I've wished for for a while.

Generally with Bloaty I have found that custom parsers are necessary. Bloaty cares about not only the data in the file, but the precise location of each bit of data in the file. For example, for the file headers and symbol table entries, we need to not only read them, but report their byte range within the file.

Generally I've found that existing libraries do not offer this information, because almost no program besides Bloaty needs it. For this reason, all of the existing parsers in Bloaty take the final approach you mentioned (grab the headers and write a complete custom parser). I expect PE will probably require the same.

learn-more commented 3 years ago

@haberman now that the initial PR is merged, how do you want to proceed with PE support?

haberman commented 3 years ago

Now that we have the lit testing in place, I'm a lot more comfortable moving forward with expanding PE support.

I'd love to see support for:

What do you think?

learn-more commented 3 years ago

Now that we have the lit testing in place, I'm a lot more comfortable moving forward with expanding PE support.

I'd love to see support for:

  • segments: this would be the regions of the file that the loader will load. The segments name is somewhat ELF-specific, but I think PE has something similar, like in the optional header?
  • symbols: using the symbol table hopefully we could get some good symbol support here.
  • compileunits: I assume PE files have this information available for debugging?

What do you think?

segments seems to be very do-able, the PE header can be split in:

As for symbols: This is usually present in a PDB file, which at least yaml2obj does not support, and which would require another (extra) parser. PE files with DWARF support should be do-able, but this are only gcc-built binaries, and those are not 'common' other than a few hobby projects.

compileunits: I have no clue to be honest, but if this was present somewhere it would probably also be in the pdb file.