metalbear-co / mirrord

Connect your local process and your cloud environment, and run local code in cloud conditions.
https://mirrord.dev
MIT License
3.77k stars 103 forks source link

use proper arm64e #2846

Open aviramha opened 4 days ago

aviramha commented 4 days ago

until now, we had a patch that generates a "dummy" arm64e by just patching the architecture of our arm64 binary to be recognized as arm64e. It worked on most occasions, because arm64 binaries can load arm64e libraries, and arm64e binaries are usually protected so they won't load us anyway. When we upgraded to newer rust #2826 it caused a regression that crashed binaries such as dlv so we downgraded to https://github.com/metalbear-co/mirrord/pull/2845

that lead me to investigate further, finding that the issue is with our fake arm64e:

➜  report DYLD_INSERT_LIBRARIES=/Users/aviramhassan/Code/mirrord/target/aarch64-apple-darwin/debug/libmirrord_layer.dylib /Users/aviramhassan/go/bin/dlv 
thread '<unnamed>' panicked at mirrord/layer/src/lib.rs:229:10:
missing internal proxy address
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
mirrord layer setup panicked
➜  report DYLD_INSERT_LIBRARIES=/Users/aviramhassan/Code/mirrord/target/aarch64-apple-darwin/debug/libmirrord_layer_arm64e.dylib /Users/aviramhassan/go/bin/dlv 
dyld[3826]: bad bind opcode 0x00
[1]    3826 abort      DYLD_INSERT_LIBRARIES= /Users/aviramhassan/go/bin/dlv

in the above example, if we use plain arm64, we manage to load. if we use arm64e, we crash. macOS prefers arm64e libraries, so our fat binary makes it prefer always arm64e. The reason we need arm64e is debugserver which is the exception - it's an arm64e binary that we can load into, and if the fat binary lacks arm64e it crashes.

My proposed solution is to create a C shim (Since Rust can't yet compile arm64e properly, also lack of CI, etc) that loads our arm64 always. Perhaps have a check if it's in arm64e mode really, then if not only load.

The cli will actually include 2 dylibs:

  1. x64 (real layer), arm64e - fake layer
  2. arm64 layer

When executing, it'll extract arm64 on arm builds then set environment variable to it, arm64e fake layer will load, find that env and load that if it matches it's conditions.

linear[bot] commented 4 days ago

MBE-474 use proper arm64e

aviramha commented 4 days ago

Sample code:

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

// Function to be executed when the library is loaded
__attribute__((constructor))
void on_library_load() {
    const char *lib_env = getenv("MIRRORD_MACOS_ARM64_LIBRARY");

    if (lib_env && *lib_env) {
        dlopen(lib_env, RTLD_LAZY);
    }
}

To compile:

clang -arch arm64e -dynamiclib -o shim.dylib shim.c
aviramha commented 4 days ago

Once this is done we can re-upgrade rust.

aviramha commented 3 days ago

After further investigation, we realized we don't even need to ship arm64e binary The reason we did it is https://github.com/metalbear-co/mirrord/pull/434 which was that we got loaded into debugserver, which is an apple shipped signed binary, which we shouldn't be loaded into in any case! Newer versions (macOS/debugserver?) don't let us load into it anyway, so shipping arm64e isn't needed. In case we do find situations where we get loaded into arm64e we can use the intercept_env hook to remove env before exec'ing those!

https://github.com/go-delve/delve/blob/05dc760877f32b63e843e88d1ce179baa6f519eb/pkg/proc/gdbserial/gdbserver.go#L558

aviramha commented 3 days ago

The plot thickens, the issue with debugserver et al is actually users with SIP disabled on their machines. We're back into the shim solution + checking if SIP is enabled to patch only when it ain't. also we found out that any standard SIP binary crashes if we don't hav arm64e and sip is disabled, but this probably works because sip patch converts it back to x64.. but we found out that if we DYLD_INSERT_LIBRARIES the shim and it loads an arm64 library that works! woohooo