Associate lazy dependencies with steps and only fetch them if those steps are built

The Problem

When invoking zig build, the user specifies a set of "steps" to execute. Each step has set of lazy dependencies it requires, but there's no mechanism to create this association in the build graph. Instead, build.zig must use build options to enable/disable lazy dependencies and decide how to deal with the corresponding "steps" that require them when they are missing.

Imagine you have an application that allows you to build either an x11 or wayland variant. Let's say each has their own set of lazy dependencies, and we've added the build options -Dx11 and -Dwayland that control whether we fetch those lazy dependencies. One way to implement this would be to have one set of common steps for both x11 and wayland, and then reuse the same build option that enables lazy dependencies to also select a variant to build, i.e.

$ zig build app -Dwayland
$ zig build app -Dx11

This solution sacrifices the ability to work with multiple variants within the same instance. For example, maybe you'd like to have a step called ci that builds multiple variants. To accommodate this, we can add multiple step variants to the same build instance, i.e.

$ zig build app-x11 app-wayland

But here we've run into an issue, we forgot we need to specify the -Dx11 and -Dwayland options to enable the required lazy dependencies for these "steps". Because there's no way to associate lazy dependencies with steps, we have to rely on the user to provide the necessary options for the steps they want to execute. When the user fails to do this, this adds an additional set of states that our build has to make a decision about. Do we remove the app-x11 step if -Dx11 is missing? Doing so makes build steps less discoverable. We could make app-x11 a fail step instead, i.e.

$ zig build app-x11
error: app-x11 requires the -Dx11 option to fetch its lazy dependencies

In addition to the problem this introduces to the zig build interface, it's also easy to misuse the b.lazyDependency API. It requires build.zig to correctly codify the association between build configuration, lazy dependencies, and the corresponding build steps. Here's how that might look today using our example project:

const x11 = b.option(bool, "x11", "Fetch x11 lazy dependencies") orelse false;

const app = b.addExecutable(.{
    .name = "app-x11",
    // ...
});
if (x11) {
    if (b.lazyDependency("x11", .{})) |x11_dep| {
        exe.linkLibrary(x11_dep.artifact("x11"));
    }
} else {
    app.dependOn(&b.addFail("app-x11 requires -Dx11 to fetch its lazy dependencies").step);
}

The Idea

The idea to solve this is instead of requiring build.zig to add additional options to control lazy dependencies and ensure it only calls b.lazyDependency at the appropriate time, instead we have build.zig always call b.lazyDependency and return a dependency object that yields artifacts and lazy paths associated that that dependency. This takes the association that already exists between steps and lazy dependencies and codifies it in the build graph. Once configuration is done, we can then use the set of steps we are building to gather the exact set of lazy dependencies we need.

The usage above becomes the following:

const app = b.addExecutable(.{
    .name = "app-x11",
    // ...
});
const x11_dep = b.lazyDependency("x11", .{});
exe.linkLibrary(x11_dep.artifact("x11"));

Compared to the current version, we've removed a build option (cut our build runner variations in half) and reduced three codepaths into one. A possible API for this is for lazyDependency to return a new std.Build.LazyDependency type which mimics the existing std.Build.Dependency API. It should be able to return a LazyPath for path since that already has the ability to hold a dependency association. Maybe adding an additional lazy_dependency field to std.Build.Step would be enough to handle most other cases.

As an example, this affects dvui - we have multiple backends and would like a way to only fetch the dependencies needed for the build steps selected. To be concrete:

zig build sdl-standalone -> fetches only sdl dependency
zig build raylib-standalone -> fetches only raylib depdendency
zig build -> fetches both sdl and raylib dependencies because by default both examples are built

Ok here's the master plan since I've started experimenting with this. First, I believe I've proven out the idea works and the implementation is pretty reasonable. I've split it up into 3 changes. The first is https://github.com/ziglang/zig/pull/21541/files, which, enhances the build system to detect when code erroneously resolves lazy paths outside of the "make phase". This seemed like a nice enhancement independent of the master plan that could be submitted immediately to get the ball rolling.

The second change will be to add a bit more book-keeping around lazy paths, namely, track when lazy_path.addStepDependencies is called. If any step attempts to resolve a lazy path within its make function without having added it as a dependency, the build will detect this misconfiguration. I have this working on a branch but it will need to be updated since I split the first change to its own PR so I'll wait for the first PR to be merged before doing that in case it needs to be reworked. (this branch also caught a few bugs in the build system where lazy path dependencies were missing)

With that second change, we now have enough information in the build system to implement this feature. I've discovered with this change the std.Build.lazyDependency function becomes unnecessary and can be removed. Build code would instead call b.dependency even if the dependency is lazy, and also treat it like any other dependency. The dependency can return 3 kinds of objects: 1) artifacts 2) modules and 3) lazy paths. The problem then becomes, after configuration is done, find all the objects that come from a "missing lazy dependency" in the build graph based on the steps we are building. We can already do this easily with modules and artifacts, but we won't be able to reliably do this for lazy paths until we have the "second change" I described above. I have this mostly working on this branch but needs to be reworked since I've split out the other two changes.

Some questions:

const app = b.addExecutable(.{
    .name = "app-x11",
    // ...
});
const x11_dep = b.lazyDependency("x11", .{});
exe.linkLibrary(x11_dep.artifact("x11"));

Assuming x11_dep is an instance of the new std.Build.LazyDependency interface and the dependency is not immediately fetched/resolved when we reach the exe.linkLibrary line, what does x11_dep.artifact("x11") return? Is it a std.Build.Step.Compile instance, and if so, where does it get its values from?

If my build script needs to access the fields of that compile step or otherwise patch/modify it (which is possible today) how would I accomplish that? Or should this no longer be a supported use case and resolved artifacts/modules/etc. be considered immutable?

I have been bitten by the underlying issue in the past and have needed to swap out steps for -D options in order to avoid fetching unneeded lazy dependencies, so I agree that the current design of b.lazyDependency is flawed and this is an area of the build system in need of some love. It would be more useful if dependencies were lazily fetched and resolved by default.

But I suspect that in order to solve this in a satisfying and intuitive way that doesn't feel like a hack, many of the build system APIs would need significant overhauls. In addition to lazy dependencies, we would need some sort of lazy artifact and lazy module abstraction, and APIs that currently take std.Build.Step.Compile or std.Build.Module would need to be migrated to those new lazy abstractions, similar to how []const u8 path-based APIs had to migrate to std.Build.LazyPath.

@castholm see my latest comment on the "master plan". On my branch where I prototyped this I realized that lazyDependency is no longer necessary to exist at all, you can now just call dependency. x11_dep.artifact("x11") would return a Compile step even if x11_dep is missing, then after configuration completes, the build graph is traversed and filtered based on the steps, and if that library is included, it can simply check if the artifact's Build object comes from a dependency that was unavailble. This check happens at the same time it happens today, after configuration and before the make phase.

I tried a few different mechanisms to implement this and I was very happy with how simple and unintrusive the final changes were to support this, especially when the implementation revealed that lazyDependency was now unnecessary...I'll have a PR soon once the first 2 changes are merged :)

In the branch you linked it looks like you're creating a compile step and populating it with dummy values. This means that code like this that accesses fields of that compile step

const some_dep = b.dependency("some_dep", .{}); // lazy
const some_lib = some_dep.artifact("some_lib");

const config_h = b.addConfigHeader(.{
    .style = .{ .cmake = b.path("config.h.in") },
    .include_path = "config.h",
}, .{
    .SOME_LIB_VERSION_MINOR = some_lib.version.?.minor,
    .SOME_LIB_VERSION_MAJOR = some_lib.version.?.major,
})

will produce inconsistent/incorrect results depending on if the artifact has been fetched or not. Having a dummy compile step that you're not allowed to touch in the wrong way and which is indistinguishable from a real compile step seems a bit footgunny.

A less error-prone abstraction might be something like this

pub const LazyArtifact = union(enum) {
    .compile_step: *std.Build.Step.Compile,
    .dependency: struct {
        .dependency: *std.Build.Dependency,
        .name: []const u8,
    },
};

which would be very similar to how lazy paths function today. Such an abstraction makes it impossible to read/modify the compile step and explicitly makes the config header example above unsupported.

Yeah you've identified some potential issues that may arise. I think whether we need to add additional types around "dummy objects" that are simply placeholders for missing dependencies will become clear when the implementation is finished. In your example I don't see any issues, you should be able to add that config header with no issue. However, I'm inclined to think that any issues that might arise might have simple fixes since this is similar to how I started the change with a LazyDependency but realized this type just ended up forwarding everything to an underlying Dependency object until I realized it became completely unnecessary. In summary, I don't have the answer to whether or not we'll want a LazyArtifact yet, we might, we might not, we'll see when I finish the implementation. It'll be much more productive to discuss how "footgunny" the implementation is once we have it and we see what existing code does/doesn't work.

ziglang / zig

Associate lazy dependencies with steps and only fetch them if those steps are built #21525

The Problem

The Idea