m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.17k stars 156 forks source link

Mach-o multi-arch containers can contain archives #320

Closed nick96 closed 1 year ago

nick96 commented 2 years ago

I've been try to write a little toy linker for MacOS using goblin. So far it has been a great experience and thanks for all the hard work you've put in the library! It has helped me learn a lot about the different formats I'm trying to work with. I am running into a bit of an issue with fat Mach-o binaries though. When I try to get the arch specific binary out of the container it fails with Invalid magic number: 0x213c6172. From having a look around 0x213c6172 is the magic number for archive files, and when I run file it says that each entry is an archive file:

$ file /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a
/Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a: Mach-O universal binary with 5 architectures: [i386:current ar archive random library] [x86_64:current ar archive random library] [x86_64h:current ar archive random library] [arm64:current ar archive random library] [arm64e:current ar archive random library]
/Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a (for architecture i386):  current ar archive random library
/Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a (for architecture x86_64):    current ar archive random library
/Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a (for architecture x86_64h):   current ar archive random library
/Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a (for architecture arm64): current ar archive random library
/Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a (for architecture arm64e):    current ar archive random library

I can't find any documentation that corroborates this but looking at the lld and mold linkers, they both handle archives being present in fat mach-o binaries so I guess it's a thing.

I put together a little example of failing to extract entries:

use std::{env::args, fs::read};

use goblin::{mach::cputype::get_arch_name_from_types, Object};

fn main() {
    let path = args().nth(1).unwrap();
    let bytes = read(path).unwrap();
    let object = Object::parse(&bytes).unwrap();
    if let Object::Mach(goblin::mach::Mach::Fat(fat)) = object {
        for (i, arch) in fat.iter_arches().enumerate() {
            let arch = arch.unwrap();
            let arch_name = get_arch_name_from_types(arch.cputype(), arch.cpusubtype()).unwrap();
            println!("Parsing entry {i} for arch {arch_name}");
            if let Err(e) = fat.get(i) {
                println!("Failed to get archive entry {i}: {}", e);
            }
        }
    } else {
        panic!("Expected multi-arch mach-o binary");
    }
}

When I pass /Library/Developer/CommandLineTools/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a to the resulting binary I get:

Parsing entry 0 for arch i386
Failed to get archive entry 0: Invalid magic number: 0x213c6172
Parsing entry 1 for arch x86_64
Failed to get archive entry 1: Invalid magic number: 0x213c6172
Parsing entry 2 for arch x86_64h
Failed to get archive entry 2: Invalid magic number: 0x213c6172
Parsing entry 3 for arch arm64
Failed to get archive entry 3: Invalid magic number: 0x213c6172
Parsing entry 4 for arch arm64e
Failed to get archive entry 4: Invalid magic number: 0x213c6172
nick96 commented 2 years ago

Would you be open to a PR changing FatArch::get to something to the effect of:

enum FatEntry<'a> {
    MachO(MachO(<'a>)),
    Archive(Archive(<'a>)),
}

pub fn get(&self, index: usize) -> error::Result<FatEntry<'a>>
m4b commented 2 years ago

Hello!

I've been try to write a little toy linker for MacOS using goblin. So far it has been a great experience and thanks for all the hard work you've put in the library!

Nice! I actually began writing a linker many moons ago, but never finished (I wanted to write something like lld or mold, a cross platform linker, because it's kind of weird they didn't exist before); in any event, it's why all the Pwrite impls got added to all the structs (that and the dead https://github.com/m4b/faerie , an object file writer), so I'm glad it ended up being useful for someone!

If you ever get a chance to open source it, i'd love to see what you ended up with! Please let me know if you ever end up doing that :)

they both handle archives being present in fat mach-o binaries so I guess it's a thing

Yes, if you invoke lipo with archives of different arches this is supported afaik, I just believe it's not implemented in goblin :)

So yes, I would definitely love to see something like this added! The tricky part, if any, would be if we can do backwards compat (doubtful, since if there are variants involved we didn't mark them non exhaustive), and secondly, if we can't do backwards compat, make the api changes as painless as possible, would be the only guiding principles here.

This would end up in the 0.6 release, which I usually like to roll up other breaking changes with, if you don't mind waiting a bit.

nick96 commented 2 years ago

Okay cool! I'll take a look at adding it. Yeah, I don't think it can be a backwards compatible change because MachO is a struct, not an enum. I guess it'd be possible to create another method and leave MultiArch::get alone but that would create a quick confusing API.