ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.44k stars 2.44k forks source link

Package hash differs on multiple machines #17064

Open Beyley opened 11 months ago

Beyley commented 11 months ago

Zig Version

0.12.0-dev.262+3cf71580c

Steps to Reproduce and Observed Behavior

git clone https://github.com/LittleBigRefresh/FreshPresence cd FreshPresence zig build

on my windows arm64 devkit image

on my main linux desktop image

my desktop thinks the hash is ...3952 while the windows arm64 machine thinks the hash is ...b01e

Expected Behavior

Both machines should successfully download and verify dependencies

Notes

original discord convo

jedisct1 commented 11 months ago

I'm not sure if this is the right place to post this, but I will do it anyway.

Since these issues can be unrelated, it may be better to open a distinct issue.

Getting different results at different optimization levels is usually due to code relying on undefined behaviors, so maybe double check that your code doesn't return pointers to stack-allocated buffers, freed buffers, etc.

kimherala commented 11 months ago

Getting different results at different optimization levels is usually due to code relying on undefined behaviors, so maybe double check that your code doesn't return pointers to stack-allocated buffers, freed buffers, etc.

This was driving me crazy, but it was a skill issue in the end. My function was returning a slice of a stack-allocated array.

ianprime0509 commented 11 months ago

I noticed hash inconsistencies on my Linux machine (using Zig 0.12.0-dev.286+b0d9bb0bb) compared to what I had gotten as the hash earlier on the same project using a different distro (previously Arch, now Fedora Silverblue). When I dug a little further, adding some debug logs to computePackageHash, I found that on my latest setup, only the first file of any given package was being included in the hash calculation.

It seems that on my system, the directory handle returned by openIterableDir and similar functions does not necessarily iterate over all files added to the directory after the handle was opened, as illustrated by the following sample program:

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var dir = try std.fs.cwd().makeOpenPathIterable("test", .{});
    defer dir.close();

    try dir.dir.writeFile("1", "1");
    try dir.dir.writeFile("2", "2");

    var walker = try dir.walk(allocator);
    defer walker.deinit();

    while (try walker.next()) |entry| {
        std.debug.print("{s} {}\n", .{ entry.path, entry.kind });
    }
}

On my system, this prints only one entry (1) when run with an empty or nonexistent test directory. Calling openIterableDir again after writing both files and iterating using the freshly opened handle yields both entries (1 and 2).

The same pattern applies to the logic used in computePackageHash, since it is passed the same directory handle that was initially created at the beginning of fetchAndUnpack and later had files added in unpackTarball. Also, the same analysis/issue applies to #17076.

If this is valid behavior for openIterableDir, then the easiest fix for this seems to be reopening the directory after fetching the tarball contents.

squeek502 commented 11 months ago

If this is valid behavior for openIterableDir

Seems like very undesirable/unintended behavior at the very least. I'd suggest opening a new issue for this.

iacore commented 11 months ago

Are you sure this is not Github's fault? The archive link is not stable.

https://github.com/keybase/client/issues/10800

andrewrk commented 11 months ago

The package hash is computed from the unpacked files on disk. The hash of the tarball itself does not matter. If GitHub is not mutating the actual file content, or deleting files, then the hash according to zig will be unchanged.

iacore commented 11 months ago

@andrewrk there is another interesting behavior. If the hash doesn't change, and you change the url, Zig will keep using the cached file.

e.g.

  1. originally: .{ .url = A, .hash = H }
  2. you run zig build. the file is cached
  3. you change it to .{ .url = B, .hash = H }
  4. you run zig build. the cached file is used, and never used the network to visit the new URL.

This behavior is surprising.

andrewrk commented 11 months ago

Working as designed, please see #16972 and #16679. The hash is the source of truth, not the URL. The contents at the URL could change at any time.