m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.17k stars 156 forks source link

mach: Support archive entries in fat binaries #322

Closed nick96 closed 1 year ago

nick96 commented 2 years ago

Adds support for archives entries in a fat binary. I ended up going down the breaking change route because I wasn't sure of a good way of doing it without one. Super happy to do another pass on this if you think there is a way!

Basically, there's just an enum MultiArchEntry with variants for a MachO or an Archive. All the functions for retrieving entries in MultiArch now return this enum, rather than MachO. This does mean you could technically represent a fat binary with a mixture of binaries and archives even though I'm not sure if that is actually allowed.

Testing-wise, I've added tests for parsing fat binaries (both the binary and archive variants). They require binaries to be checked in which I've done. I thought it was easiest to check them in as they're pretty small and I saw there was already a DLL file checked in. Alternatively, we could build them as part of the test if you'd prefer, though that would mean that the tests could only run on MacOS because I don't think lipo exists on Linux.

m4b commented 2 years ago

this looks great, other than some minor comments i had

m4b commented 2 years ago

re this CI failure: https://github.com/m4b/goblin/runs/7601981804?check_suite_focus=true#step:10:52

I think you'll have to make macho32 + macho64 depend on archive feature?

nick96 commented 2 years ago

Thanks for the quick review! I don’t think I’ll have a chance to come back to this today. Hopefully tomorrow though

m4b commented 2 years ago

No rush :)

nick96 commented 2 years ago

However, could you strip the binaries to make sure they're minimally sized? 110kb seems like a lot to me for single function.

I've tried stripping the binaries but I get this error:

strip: error: symbols referenced by indirect symbol table entries that can't be stripped in: /Users/nspain/devel/personal/goblin/assets/hello_world_fat_binaries (for architecture x86_64)
_printf
nick96 commented 2 years ago

I think I've addressed all you comments, except for the naming of MultiArchEntry. I'll update that once we reach a decision

m4b commented 2 years ago

Ok I think I like SingleArch the best so far, let's proceed with that; in case anyone else is monitoring this PR, please add your opinion too :)

nick96 commented 2 years ago

I've updated it to be SingleArch. Ready for re-review when you're free!

nick96 commented 2 years ago

Actually, just playing around with this branch on my linker project and I've broken something in the code that looks at the hint bytes to determine the file type.

nick96 commented 2 years ago

Okay. The issue was that I'd missed the check for fat binaries in macho::peek_bytes, fixed now. What I thought was an archive was actually a fat binary 🤦

m4b commented 1 year ago

@nick96 thanks so much for this PR and your patience! Are you ok to wait a bit for a release (since this is a breaking change, as I mentioned, I like to roll them up if possible, though ideally they become less and less)

nick96 commented 1 year ago

@nick96 thanks so much for this PR and your patience! Are you ok to wait a bit for a release (since this is a breaking change, as I mentioned, I like to roll them up if possible, though ideally they become less and less)

No worries on both counts 😊. Makes sense to batch them up.

m4b commented 1 year ago

@nick96 thank you so much for your patience, this is now released in 0.6.0 :)