ninja-build / ninja

a small build system with a focus on speed
https://ninja-build.org/
Apache License 2.0
11.17k stars 1.59k forks source link

Built targets are still dirty after a successful build #2473

Closed nre-ableton closed 2 months ago

nre-ableton commented 2 months ago

When we use ninja to build our software, we run it twice: first to build the actual software, and then a second time afterwards to verify that it no-ops. We do this is because we have a very complex build with several no-code targets and such, and we want to make sure that no targets are still considered dirty even after a successful result. This second no-op pass is achieved by running ninja -n -d explain and then failing the build if we don't grep no work to do in ninja's output.

As of ninja 1.12.0, our builds now fail this step because ninja outputs the following:

ninja explain: recorded mtime of <redacted>.app older than most recent input <redacted>.app/Contents/MacOS/<redacted> (1722997299569634707 vs 1722997299973748395) ninja explain: <redacted>.app is dirty

We have about a dozen such targets that fail this test (none of which are no-code targets). The difference in these timestamps varies from a few milliseconds to a few seconds. Looking at the release notes and changelog of 1.12.0, this PR jumped out at me:

If an input is changed, the subsequent run will detect the output as dirty since its recorded mtime reflects when the build command began, not when the output was actually written to disk.

For a no-op build, it seems that this could mistakenly mark a target as dirty, since the mtime would be when the command started and not when the output (ie, none) was written to disk. I'm not terribly familiar with ninja internals, so please forgive me if I'm way off here. :sweat_smile:

We are running ninja 1.12.0 (we'll update to 1.12.1 later, but not until we figure out what's going on with 1.12.0) on macOS Ventura using Xcode 15.2, in case that matters.

nre-ableton commented 2 months ago

Ping @jdrouhard, is there any chance that the changes in https://github.com/ninja-build/ninja/pull/1943 could be responsible for this behavior?

digit-google commented 2 months ago

Can you post the full build <redacted>.app: ... statement from your build plan? Moreover, having a directory output path that depends on a file inside it, as in your example, seems wrong. Can you explain why this dependency exist exactly? Maybe there is a better way to get the same results that doesn't require this?

By design, Ninja uses timestamps to determine quickly and cheaply if the content of a target (i.e. a file-system path) has changed.

Unfortunately, for directories, the timestamp value is not a reliable indication that anything changed in it. Its timestamp is only updated when one of its direct entries (top-level files or sub-directories directly under it) change, and ignores anything below. This is what is happening here.

In general, one should not use directories as output / target paths in Ninja to avoid this problem. The change in #1943 does not change that, it just triggers the problematic issue for your build plan.

One way to deal with this is to always produce the directory atomically, i.e. create a new empty temporary directory, populate it, then compare its content with the existing one. If they are the same do not do anything (use the restat feature), otherwise swap the temp and old directory atomically. This ensures there are never lingering/stale artifacts, and that the directory's timestamp correspond to each "fresh" tree of content. That is what we do in our build system (which also runs Ninja twice to verify that the second invocation is a no-op).

NOTE: This can only work if the content of the directory is never modified by other commands though; but if you have a build plan with different actions poking at the directory, you probably have a broken build graph with flaky incremental build breaks that occur randomly.

Another alternative is to create a zip archive instead of a directory, that can be treated as a single file by Ninja, but it also means that anything that depends on it needs to run a command that unzip its content to a temporary directory before being able to use it (e.g. that's what the Chromium build does for Android resources). I would not recommend it though.

nre-ableton commented 2 months ago

It turns out the problem was in our build-system's ninja generator. Sorry for the noise!

digit-google commented 2 months ago

Thanks for telling us about it :-)