ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.59k stars 2.53k forks source link

brainstorming a way to make executables that work on linuxes with differing dynamic linker paths #6350

Open marler8997 opened 4 years ago

marler8997 commented 4 years ago

On my 64-bit NixOS machine, when I cross build to x86_64-linux-gnu, the ELF interpreter is set to /lib64/ld-linux-x86-64.so.2, however, this loader does not exist on my NixOS distribution.

Note that this only occurs when my executable is compiled dynamically (i.e. if I try to link libc), otherwise, no interpreter is required so the issue does not manifest.

I understand that a fix for this issue might prove difficult. There's no way to set an absolute path to a loader that will work on all distributions. The only fix I can think of is to never use an ELF interpreter. When we need dynamic libraries, we could compile the loader into the final executable...or maybe a better solution would be to include startup code that looks for the system loader that works on as many distributions as we can support.

Here is the code/build file to reproduce the issue:

const std = @import("std");
pub fn main() void {
    std.debug.warn("hello\n", .{});
}
const std = @import("std");
const Builder = std.build.Builder;

pub fn build(b: *Builder) void {
    const mode = b.standardReleaseOptions();
    const target = b.standardTargetOptions(.{});

    const exe = b.addExecutable("hello", "hello.zig");
    exe.setBuildMode(mode);
    exe.setTarget(target);
    exe.linkSystemLibrary("c");
    b.default_step.dependOn(&exe.step);
    exe.install();

    const run = b.step("run", "Run the demo");
    const run_cmd = exe.run();
    run.dependOn(&run_cmd.step);
}
# build
$ zig build -Dtarget=x86_64-linux-gnu

# show issue, note the error is caused by the ELF interpreter not existing
$ ./zig-cache/bin/hello
bash: ./zig-cache/bin/hello: No such file or directory

# print interpreter
$ patchelf --print-interpreter zig-cache/bin/hello 
/lib64/ld-linux-x86-64.so.2
Aransentin commented 4 years ago

Is there even a way to make a functional "embedded" loader that solves this problem properly on Linux, in theory?

Even if the binary embedded such a loader statically, what directories the system loader searches for libraries is platform-dependent - it's even possible to run two environments (glibc+musl) in parallel with some hacking, so musl-built binaries load their entirely separate set of libs located in e.g. /lib/x86_64-linux-musl/ or whatever path you want.

marler8997 commented 4 years ago

It looks like the Zig compiler already has logic to find the system's dynamic linker (see std/zig/system.zig).

Given this, if we simply move that logic from link time to runtime then we have solution. At the cost of extra startup code, we can create exes that support as many distributions as we care to support.

I think by default it would be reasonable to change the default behavior based on whether we are compiling for native or cross targets. Find the system's dynamic linker at "link time" if we compile natively, and find it at runtime if we are cross compiling.

marler8997 commented 4 years ago

Update: I've gotten a proof of concept to work written in C using the zig cc compiler. I'm able to compile a shared library, and then an executable that uses it, and run it without an ELF interpreter. see https://github.com/marler8997/reloader

$ git clone https://github.com/marler8997/reloader
$ cd reloader/c
$ make
...
$ ./out/app-nolibc
RELOADER: reloading with this loader: /nix/store/xg6ilb9g9zhi2zg1dpi4zcp288rhnvns-glibc-2.30/lib/ld-linux-x86-64.so.2
RELOADER: already reloaded
example message to print integer 1234: 1234
foopassthru(123) = 123
Success
$ patchelf --print-interpreter out/app-nolibc
cannot find .interp section
$ patchelf --print-needed out/app-nolibc
libfoo.so.0

Note that the purpose of this code right now is just to see if this can work in theory. There are many details to be worked out, but I believe I've been able to prove that this solution is possible.

Details to look into

As of now gcc/ld will compile a working app-nolibc, but it adds the ELF interpreter, and I haven't figured out how to prevent it from doing that (setting -Wl,--dynamic-linker= didn't work). Also removing the .interp section after the fact with objcopy --remove-section .interp causes the kernel to fail to load it with an "exec format error", so that will need to be figured out. But we know these issues are solvable because zig cc is able to generate an exe that does work. Maybe we could add a new operation to patchelf for this, patchelf --remove-interpreter? Also note that this issue occurs if I link to libc whether I'm using zig cc or gcc, so using a new patchelf operation or adding additional options to the toolchains could be in order.

fzakaria commented 2 years ago

Take a look at https://github.com/Mic92/nix-ld