ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.68k stars 2.53k forks source link

ReleaseSmall macOS hello world binary 51K vs. only 33K with C #20468

Open mk12 opened 4 months ago

mk12 commented 4 months ago

Zig Version

0.14.0-dev.130+cb308ba3a

Steps to Reproduce and Observed Behavior

Environment

I'm on macOS Sonoma 14.5 on MacBook Pro with the Apple M1 Pro chip.

Reproduce

Hello world in C compiled with clang is 33K:

printf '#include <stdio.h>\n int main(void) { puts("Hello"); return 0; }' > hello.c
clang -Oz -o hello-c hello.c

wc -c hello-c
# 33432 hello-c

Hello world in Zig is 51K:

printf 'pub fn main() void { @import("std").debug.print("Hello\n", .{}); }' > hello.zig
zig build-exe -O ReleaseSmall -fsingle-threaded -femit-bin=hello-zig hello.zig

wc -c hello-zig
# 51320 hello-zig

Investigation

Here's my theory. The hello-c binary has no __DATA segment:

bloaty --domain=file hello-c -v
# FILE MAP:
# [0, 20] [Mach-O Headers], [Mach-O Headers]
# [20, 68] [Mach-O Headers], [Mach-O Headers]
# [68, 1f0] __TEXT, [Mach-O Headers]
# [1f0, 288] __TEXT, [Mach-O Headers]
# [288, 2d0] __TEXT, [Mach-O Headers]
# [2d0, 2e0] __TEXT, [Mach-O Headers]
# [2e0, 2f0] __TEXT, [Mach-O Headers]
# [2f0, 308] __TEXT, [Mach-O Headers]
# [308, 358] __TEXT, [Mach-O Headers]
# [358, 378] __TEXT, [Mach-O Headers]
# [378, 390] __TEXT, [Mach-O Headers]
# [390, 3b0] __TEXT, [Mach-O Headers]
# [3b0, 3c0] __TEXT, [Mach-O Headers]
# [3c0, 3d8] __TEXT, [Mach-O Headers]
# [3d8, 410] __TEXT, [Mach-O Headers]
# [410, 420] __TEXT, [Mach-O Headers]
# [420, 430] __TEXT, [Mach-O Headers]
# [430, 440] __TEXT, [Mach-O Headers]
# [440, 3f74] __TEXT, [__TEXT]
# [3f74, 3f94] __TEXT, __TEXT,__text
# [3f94, 3fa0] __TEXT, __TEXT,__stubs
# [3fa0, 3fa6] __TEXT, __TEXT,__cstring
# [3fa6, 3fa8] __TEXT, [__TEXT]
# [3fa8, 4000] __TEXT, __TEXT,__unwind_info
# [4000, 4008] __DATA_CONST, __DATA_CONST,__got
# [4008, 8000] __DATA_CONST, [__DATA_CONST]
# [8000, 8090] __LINKEDIT, [__LINKEDIT]
# [8090, 8098] __LINKEDIT, Function Start Addresses
# [8098, 80c8] __LINKEDIT, Symbol Table
# [80c8, 80d0] __LINKEDIT, Indirect Symbol Table
# [80d0, 80f8] __LINKEDIT, String Table
# [80f8, 8100] __LINKEDIT, [__LINKEDIT]
# [8100, 8298] __LINKEDIT, Code Signature
# ...

On the other hand, hello-zig has an 80 byte __DATA segment, pushing the final __LINKEDIT segment to offset 0xc000 for 16K page alignment:

bloaty --domain=file hello-zig -v
# FILE MAP:
# [0, 20] [Mach-O Headers], [Mach-O Headers]
# [20, 68] [Mach-O Headers], [Mach-O Headers]
# [68, 2e0] __TEXT, [Mach-O Headers]
# [2e0, 378] __TEXT, [Mach-O Headers]
# [378, 500] __TEXT, [Mach-O Headers]
# [500, 548] __TEXT, [Mach-O Headers]
# [548, 578] __TEXT, [Mach-O Headers]
# [578, 588] __TEXT, [Mach-O Headers]
# [588, 598] __TEXT, [Mach-O Headers]
# [598, 5b0] __TEXT, [Mach-O Headers]
# [5b0, 600] __TEXT, [Mach-O Headers]
# [600, 620] __TEXT, [Mach-O Headers]
# [620, 638] __TEXT, [Mach-O Headers]
# [638, 648] __TEXT, [Mach-O Headers]
# [648, 668] __TEXT, [Mach-O Headers]
# [668, 680] __TEXT, [Mach-O Headers]
# [680, 6b8] __TEXT, [Mach-O Headers]
# [6b8, 6c8] __TEXT, [Mach-O Headers]
# [6c8, 12f4] __TEXT, __TEXT,__text
# [12f4, 133c] __TEXT, __TEXT,__stubs
# [133c, 139c] __TEXT, __TEXT,__stub_helper
# [139c, 13a0] __TEXT, [__TEXT]
# [13a0, 140c] __TEXT, __TEXT,__const
# [140c, 14c3] __TEXT, __TEXT,__cstring
# [14c3, 14c4] __TEXT, [__TEXT]
# [14c4, 24fc] __TEXT, __TEXT,__unwind_info
# [24fc, 2500] __TEXT, [__TEXT]
# [2500, 2718] __TEXT, __TEXT,__eh_frame
# [2718, 4000] [Unmapped], [Unmapped]
# [4000, 4008] __DATA_CONST, __DATA_CONST,__got
# [4008, 8000] [Unmapped], [Unmapped]
# [8000, 8030] __DATA, __DATA,__la_symbol_ptr
# [8030, 8040] __DATA, __DATA,__const
# [8040, 8050] __DATA, __DATA,__data
# [8050, c000] [Unmapped], [Unmapped]
# [c000, c008] __LINKEDIT, Rebase Info
# [c008, c020] __LINKEDIT, Binding Info
# [c020, c088] __LINKEDIT, Lazy Binding Info
# [c088, c0b8] __LINKEDIT, Export Info
# [c0b8, c338] __LINKEDIT, Symbol Table
# [c338, c36c] __LINKEDIT, Indirect Symbol Table
# [c36c, c370] __LINKEDIT, [__LINKEDIT]
# [c370, c776] __LINKEDIT, String Table
# [c776, c780] __LINKEDIT, [__LINKEDIT]
# [c780, c878] __LINKEDIT, Code Signature
# ...

I'm not sure about the other sections, but let's look at __data for hello-c:

otool -d hello-c
# hello-c:

nm -s __DATA __data hello-c
# (no output)

And for hello-zig:

otool -d hello-zig
# hello-zig:
# (__DATA,__data) section
# 0000000100008040  00000000 00000000 ffffffff ffffffff

nm -s __DATA __data hello-zig
# 0000000100008048 d _Progress.stderr_mutex.0
# 0000000100008040 d ___dso_handle
# 0000000100008040 d dyld_private

That _Progress.stderr_mutex.0 looks suspicious to me. I believe it comes from here: https://github.com/ziglang/zig/blob/cb308ba3ac2d7e3735d1cb42ef085edb1e6db723/lib/std/Progress.zig#L1388

But I'm not using std.Progress anywhere so I don't know how it makes it into the final binary.

Expected Behavior

Zig should be able to match the binary size of C/clang.

alexrp commented 4 months ago

Note that for "real" code, you should use std.log.

Edit: That said, it seems like std.log.defaultLog also uses std.Progress. So you'd have to supply your own logFn to get completely around pulling that in, I suppose.

mk12 commented 4 months ago

@alexrp Ah good point. I had hoped -fsingle-threaded would eliminate anything to do with locking but I guess it doesn't get rid of the global variable. However even using std.io.getStdErr directly I'm still seeing the mutex:

pub fn main() void{
    @import("std").io.getStdErr().writer().writeAll("Hello\n") catch unreachable;
}

Same binary size, same otool and nm output.

alexrp commented 4 months ago

Ok, that I can't explain. That seems odd.

I guess this is why the mutex global variable is not completely eliminated though:

https://github.com/ziglang/zig/blob/cb308ba3ac2d7e3735d1cb42ef085edb1e6db723/lib/std/Thread/Mutex.zig#L85-L104