ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.67k stars 2.47k forks source link

Zig fails to compile `zig init-exe` program on linux-aarch64 #16331

Open robbielyman opened 1 year ago

robbielyman commented 1 year ago

Zig Version

0.11.0-dev.3937+78eb3c561

Steps to Reproduce and Observed Behavior

downloaded the latest zig, created a new project with zig init-exe and ran zig build run. I get the following error:

thread 26466 panic: reached unreachable code
/home/rylee/src/zig/lib/std/os.zig:4444:19: 0x33b857 in msync (build)
        .INVAL => unreachable, // Invalid parameters.
                  ^
/home/rylee/src/zig/lib/std/debug.zig:468:25: 0x320267 in isValidMemory (build)
                os.msync(aligned_memory, os.MSF.ASYNC) catch |err| {
                        ^
/home/rylee/src/zig/lib/std/debug.zig:511:77: 0x30896f in next_internal (build)
        if (fp == 0 or !mem.isAligned(fp, @alignOf(usize)) or !isValidMemory(fp))
                                                                            ^
/home/rylee/src/zig/lib/std/debug.zig:451:45: 0x2e8687 in next (build)
                address = self.next_internal() orelse return null;
                                            ^
/home/rylee/src/zig/lib/std/debug.zig:211:29: 0x2b4373 in captureStackTrace (build)
            addr.* = it.next() orelse {
                            ^
/home/rylee/src/zig/lib/std/Build/Step.zig:149:32: 0x29b82f in init (build)
    std.debug.captureStackTrace(first_ret_addr, &stack_trace);
                               ^
Panicked during a panic. Aborting.
error: the following build command crashed:
/home/rylee/src/ziggity/zig-cache/o/222e27c74b2c6353e1106ce58173b797/build /home/rylee/src/zig/zig /home/rylee/src/ziggity /home/rylee/src/ziggity/zig-cache /home/rylee/.cache/zig run

Expected Behavior

zig build run should work.

matu3ba commented 1 year ago

I can not reproduce with

Linux localhost 4.4.226-lineageos-gc02c91e #1 SMP PREEMPT Mon Jun 29 02:33:48 UTC 2020 aarch64 Android

What Kernel are you on (uname -a)? Does the issue occur sporadically or always?

robbielyman commented 1 year ago

some form of crash always occurs, but the specifics appear to vary. here's uname -a

Linux ryleebtw 6.3.0-asahi-10-1-ARCH #1 SMP PREEMPT_DYNAMIC Thu, 29 Jun 2023 03:43:33 +0000 aarch64 GNU/Linux
matu3ba commented 1 year ago

@ryleelyman

  1. Did you check in a debugger, most simple would be rr, what arguments msync has on linking? From man msync:

    ERRORS
       EBUSY  MS_INVALIDATE was specified in flags, and a memory lock exists for the specified address range.
    
       EINVAL addr is not a multiple of PAGESIZE; or any bit other than MS_ASYNC | MS_INVALIDATE | MS_SYNC is set  in
              flags; or both MS_SYNC and MS_ASYNC are set in flags.
    
       ENOMEM The indicated memory (or part of it) was not mapped.

    Context.

  2. https://stackoverflow.com/questions/20133103/does-msync-write-to-file-only-changed-pages-or-wholly-cached-buffer mentions "One other observable effect is that it causes the file modification time to be updated." Not sure, if this is always the case.

  3. https://github.com/ziglang/zig/pull/13042#issuecomment-1288162632 shows various strace commands to get an idea of how to minimize the problem. Unfortunately I dont have aarch64 hardware aside of my phone for which I cant easily modify the Kernel, so I'm unable to do this. I dont think this one is related to LLVM miscompilation, but I could be again wrong and the problem is the same and it does not manifest on my phone. You could try this standalone test: #13639

leecannon commented 1 year ago

This would be fixed by https://github.com/ziglang/zig/issues/11308

This error with msync is even called out in the first comment.

robbielyman commented 1 year ago

Just to close the loop a little: the following allows zig to compile things (including itself when building from source) on M1/M2 mac hardware running linux. I doubt this is a change that ought to be made, given that it might break working instances of zig on aarch64 that are not Apple silicon.

// mem.zig
pub const page_size = switch (builtin.cpu.arch) {
    .wasm32, .wasm64 => 64 * 1024,
    .aarch64 => switch (builtin.os.tag) {
+        .macos, .ios, .watchos, .tvos, .linux => 16 * 1024,
-        .macos, .ios, .watchos, .tvos, => 16 * 1024,
        else => 4 * 1024,
    },
    .sparc64 => 8 * 1024,
    else => 4 * 1024,
};
andrewrk commented 1 year ago

@ryleelyman ~that patch would be welcome.~

Related: #4082

Edit: oops, that's not right since we plan to rename page_size to min_page_size in which case it should remain at 4KB.

There needs to be a runtime check on the page size somewhere instead of relying on this hard coded value.

BitShiftNow commented 1 year ago

I was a bit surprised to find that these values were hardcoded when I started looking into this topic today (I am completely new to Zig). To me that should be just a fallback if for some reason there is no access to the page size during runtime.

When using C on UNIX style systems there is the sysconf call to get the page size: https://www.man7.org/linux/man-pages/man3/sysconf.3.html Apple Silicon mentions a vm_page_size global variable (presumably when using C++ or ObjectiveC though no clue to be honest... And also not quite sure how that one is set then...): https://developer.apple.com/documentation/apple-silicon/addressing-architectural-differences-in-your-macos-code On Windows there seems to be GetNativeSystemInfo: https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getnativesysteminfo Also found this: https://devblogs.microsoft.com/oldnewthing/20210510-00/?p=105200

I am fairly new to all this low level stuff though so apologies if that is not relevant in any shape or form.

EDIT: I just saw the whole discussion here whooops: https://github.com/ziglang/zig/issues/11308 Ignore me. +1 on the vote for a runtime way of getting the page size :)

RossComputerGuy commented 11 months ago

I'm looking to fix this myself for Apple M1 Pro and what I've kind of thought of is the current way we handle page sizing isn't optimal. The issue I see is there's quite a bit of std relying on the comptime aspect of std.mem.page_size. Imo I think there should be 2 methods to get the page size. The first one is a runtime method (std.mem.getPageSize()), this would use sysconf on Linux for example and would fall back to the comptime if none of the other ways of getting the page size worked. The actual comptime one (std.mem.page_size) could use the target information if it has been set and defaults to the "platform standard default" (how we currently have it atm). This gives the advantage of being able to optimize the binary for a specific target.