ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.38k stars 2.51k forks source link

make build.zig.zon contain all urls, including mirrors, and hashes of entire dependency tree #14309

Open andrewrk opened 1 year ago

andrewrk commented 1 year ago

Extracted from #14265.

The motivation for this is faster dependency fetching, avoiding round-trips from the network.

The idea here is that, after fetching all dependencies, recursively, zig would edit the user's build.zig.zon file and automatically add a section that contains all urls & hashes (including mirrors) for the entire dependency tree. It might look something like this:

.{
    // ...
    .mirrors = .{
        .@"sha256=c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae" = .{
            "https://example.com/1.tar.gz",
            "https://example.com/2.tar.gz",
        },
    },
}

In this case it matches exactly with #14292 which could be nice, or, on the other hand, it might be important to make a distinction between automatically added ones (which should also be removed automatically) or manually added ones (which should always be preserved).

Depending on how #14288 is resolved with regards to conflict resolution, this section would perhaps be combined with more tooling-managed data about each dependency, such as dependency overrides:

.{
    // ...
    .config = .{
        .@"sha256=c9b30cffc40999d2c078ff350cbcee642970a224fe123c756d0892f876cf1aae" = .{
            .url = .{
                "https://example.com/1.tar.gz",
                "https://example.com/2.tar.gz",
            },
            .overrides = .{
                .libz = "sha256=9ba4f49895b174a3f918d489238acbc146bd393575062b2e3be33488b688e36f",
            },
        },
    },
}

When combined with #14294, this may additionally need to float the fetch plugins to the top along with mirrors.

nektro commented 1 year ago

rather than mixing manual and automatic ones and since its going to contain info on the entire tree, imo it should be its own .zon file

deflock commented 1 year ago

I doubt it's a good idea to modify user's build.zig.zon, in some crazy setups it may be mounted/bound as readonly for some reason, when using containers or something like this

andrewrk commented 1 year ago

It's already planned to modify build.zig.zon as part of #14288, for example when a package dependency is added or removed.

deflock commented 1 year ago

Looks like we need a QA stream with describing complete package management workflow by writing an RFC there😅

loic-sharma commented 1 year ago

Copying my comment: https://github.com/ziglang/zig/issues/14290#issuecomment-1382647357

I used to work on NuGet, .NET's package manager. NuGet originally put the entire dependency tree in its packages.config file. This was a huge mistake. We later migrated to a new format, PackageReference, where you only list your project's direct dependencies. This migration made a significant difference in developer satisfaction (measured through HaTS).

more complicated logic for fetching

Very true, but, you're offloading this complexity to the developer. .NET developers complained that packages.config was unmanageable. It was difficult to tell if a dependency could be safely removed. Also, you'd need to understand your dependency tree if you wanted to update a single dependency. Excellent IDE tooling was not enough to offset this pain.

not only from a performance perspective

Performance was a top concern for NuGet's migration: NuGet runs in every .NET build, including Visual Studio's design time builds that power Intellisense. We were able to make this fast by:

  1. Use minimum version selection. This makes your dependency graph stable for a given set of direct dependencies. In other words, you only need to download packages if the developer adds/updates their direct dependencies.
  2. Optimize for the "no-op" case. NuGet is blazing fast if your direct dependencies haven't changed.

EDIT: NuGet does store its full dependency tree in its lock file. However, that's a separate file that's not meant to be edited by the developer.

mlugg commented 1 year ago

While caching this information for performance does make sense, I do think it would be better for the full dependency tree to be cached in a different file, maybe build-cache.zon or similar for a few reasons. This is significantly more manageable for the user (particularly if they want to edit the file directly rather than using tooling, which I definitely think should be a supported use case), gives cleaner repository commit history (you can just ignore changes to build-cache.zon, you only need to actually worry about build.zon! also yeah i want to bikeshed that name i'm not a fan of build.zig.zon), and makes it easier to identify old dependencies and other cruft in your build.zon.

acristoffers commented 1 year ago

This sounds a lot like a lockfile to me, which I'm missing in zig build now.

I've fixed nixe's scripts in ZLS after ZIG introduced the package manager. My solution was basically to hardcode the url's of all dependencies and to use some ugly string manipulations to turn the zon file into an importable json. I never really liked this solution and now decided to give it another shot and create a more reliable way of building zig projects in nix, only to realize that there is no lockfile and the information in the zon file is insufficient.

The information you list in this issue (the entire dependency tree) is important when you want to publish to Flathub too. Just like nix, Flathub's build system does not allow you to download files, so you have to create a JSON file containing all of the dependencies (those you cannot find in Flathub itself) so the system can download them and put them in the right place. And just like nix, it also requires a hash of the downloaded file so it can make sure it got the right thing before even attempting to build.

I was first bitten by the lack of tarball hashes, and then by the realization that I have no information about the dependencies of the dependencies. The only way for me to circumvent that is to do what cargo2nix does (for other reasons) and create an intermediate file that have all that, basically creating a lockfile myself. Projects like nix-community/naersk (also for rust) just parses the lockfile and gets all the information it needs from there during the build, which is much faster and more reliable.

Also, while here, I would like to note that a lockfile should have information about all packages in the dependency tree, no matter if it is used or not during the build. That is, it may be possible to disable packages depending on the platform, but a lockfile should have information about those nonetheless, since the build platform might be different from the one creating the lockfile. I kinda got bit by it when building a rust project with cargo2nix (but not because of a lack of package information, but lack of "feature" information leading to windows- and macos-only features being enabled in linux; a bit different, but same vibe).

booniepepper commented 1 year ago

I could talk anyone's ear off on lockfiles right now. I work in Amazon and work on maintaining their large-scale codebase. (And of course, any opinions here or ever are mine and not my employers.)

There are going to be use-cases for and against lockfiles, where I'm defining a lockfile as the collection of hashes of an entire transitive dependency tree. I really really appreciate Zig's opinionated approach to the language, and the "no really, there is only one way to do this backbone. Lockfiles are most useful for projects that are distributed across many developers, projects that are dependency-heavy, or "production" applications that occasionally need to make updates to some software but not all. Lockfiles are just noise and hassle when the project is used by a small team, the project has few dependencies, and is just some library (where the lockfile might be of occasional interest, but in general just an opportunity for conflicts).

I think Zig benefits most if libraries are checking in a build.zig.zon (even if they do not contain hashes). Putting the ever-changing lockfile hashes directly into the build.zig.zon will discourage checking in the whole file for some developer populations.

booniepepper commented 1 year ago

Of course, it's fine to cache the artifacts (IDK in ~/.cache/zig/<hash>/<etc> or something) and skip the unnecessary network calls if something's already been fetched. They could be used directly from their actual path, or symlinked or copied to ./zig-cache if it's preferred

Above that... With how principled I've seen Zig be with things like "unused variable is a compilation error" I'd be a little wary of anything that gives incentives to make enormous dependency trees, or makes enormous dependency trees too pleasant to work with (say, like the uglier sides of the NPM ecosystem). On the same note you also don't want to make code sharing too difficult -- that makes an ecosystem where there ends up being only a handful of shared do-all-the-things libraries (say, Boost) and the language will get pressure to add things that could come from a library ecosystem.

I'd need to think more on whether this is an argument for or against my previous recommendations, though.

Cloudef commented 9 months ago

Lockfiles are pretty much required on nix, otherwise you have to use separate program to generate lockfile and commit it to your repo for nix build to be aware of any network calls it has to made (or lookups from binary cache). Without them offline builds are pretty much no-go unless we require people to commit their "$ZIG_GLOBAL_CACHE/p" folder.

Cloudef commented 9 months ago

https://github.com/Cloudef/zig2nix can now build build.zig.zon projects with nix

Cloudef commented 9 months ago

@andrewrk I think if the zig build --fetch proposed here (or similar flag) https://github.com/ziglang/zig/issues/14597 had option to output whole dependency graph with the original artifact urls and artifact hashes without actually making any network calls then that would work for nix as well quite well. It would make lock file optional, but the information still has to be commited to repo to have the information available without network.

EDIT: nevermind, the lock file still should have all deps to prevent problems as described by https://github.com/ziglang/zig/issues/14309#issuecomment-1528859615

andrewrk commented 9 months ago

Note that zig build --fetch was implemented in a605bd9887a8a4220e177e42259809e0680d4221 (3 months ago)

Cloudef commented 9 months ago

For reference, how I currently handle zig.zon deps: https://github.com/Cloudef/zig2nix/tree/master/tools Particularly the zon2json-lock.nix and zon2nix.nix are the most relevant files.