Closed motiejus closed 1 year ago
I've observed that some of the ~/.cache/zig/h/*.txt
files (in my understanding, cache manifests) contain non-canonicalized paths:
$ grep -hr /home/motiejus/sandbox/a1/lib/std/std.zig ~/.cache/zig
5621 305412 1656217371235171984 66001e1a55f68e4681c1f32a69223647 /home/motiejus/sandbox/a1/lib/std/std.zig
IMO I would expcect it to be:
5621 305412 1656217371235171984 66001e1a55f68e4681c1f32a69223647 /code/zig/lib/std/std.zig
Or even better, if at all possible?
5621 305412 1656217371235171984 66001e1a55f68e4681c1f32a69223647 lib/std/std.zig
Though I did not find where this path is actually constructed.
Ideally, manifest files in the global cache would contain only absolute files, since the global cache is shared among multiple projects, each with a potentially different working directory.
Meanwhile, manifest files in the local cache would ideally contain only paths relative to the local project root, so that the project directory could be moved to a new location and the cache would continue functioning seamlessly.
I had a look around and any time we call Cache.addFile
, it calls fs.path.resolve
on the input path to convert from relative path to absolute path. The one exception is Cache.addFilePostContents
, which accepts a pre-resolved path. One of the callsites does this correctly, but the other does not, and I think that is the cause of the bug.
I don't think resolving symlinks is necessary or desirable.
I'll propose a different PR shortly to solve this problem.
I opened #13071 after which no relative paths show up in the global cache manifest.
However, the system will still consider two paths to be different even if they are symlinks resolving to the same directory, leaving the original problem detailed in this issue unresolved. However, might I suggest that the link is resolved prior to setting the environment variable? For example, this solves the problem:
[nix-shell:~/Downloads/zig/build-release]$ cd a1
[nix-shell:~/Downloads/zig/build-release/a1]$ ZIG_LIB_DIR=$(readlink lib) ../stage4/bin/zig build-exe ../../test/standalone/hello_world/hello.zig
[nix-shell:~/Downloads/zig/build-release/a1]$ cd ../a2
[nix-shell:~/Downloads/zig/build-release/a2]$ ZIG_LIB_DIR=$(readlink lib) ../stage4/bin/zig build-exe ../../test/standalone/hello_world/hello.zig
With this strategy, only one copy of the global static libraries such as compiler_rt.a and libc++.a is generated.
However, might I suggest that the link is resolved prior to setting the environment variable? For example, this solves the problem:
Unfortunately, the Bazel's symlink to zig's lib
is an absolute path, which brings back #12980 back again:. If we readlink it, the ZIG_LIB_DIR
becomes absolute, which causes the result of zig cc -M
to be absolute, which beats Bazel's global cache.
As a result, two machines/people with a different sandbox path (e.g. /home/motiejus
and /home/john
is enough to make a difference a difference) cannot share the compiled artifacts.
In other words, I am looking for the right ZIG_LIB_DIR
, so the result of zig cc -M file.c
contains only relative paths.
Hmm I'm actually not able to reproduce the problem. Here is an attempt where I am getting what I understand to be the desired behavior, based on our discussion:
[nix-shell:~/Downloads/zig/build-release]$ cd a1
[nix-shell:~/Downloads/zig/build-release/a1]$ ls -l lib
lrwxrwxrwx 1 andy users 28 Oct 4 23:02 lib -> /home/andy/Downloads/zig/lib
[nix-shell:~/Downloads/zig/build-release/a1]$ ZIG_LIB_DIR=lib ../stage4/bin/zig cc -o hello -c hello.c -MD -MV -MF hello.d -target x86_64-linux-gnu.2.33
[nix-shell:~/Downloads/zig/build-release/a1]$ cat hello.d
hello: hello.c lib/libc/include/generic-glibc/time.h \
lib/libc/include/generic-glibc/features.h \
lib/libc/include/generic-glibc/features-time64.h \
lib/libc/include/x86_64-linux-gnu/bits/wordsize.h \
lib/libc/include/x86_64-linux-gnu/bits/timesize.h \
lib/libc/include/generic-glibc/stdc-predef.h \
lib/libc/include/generic-glibc/sys/cdefs.h \
lib/libc/include/x86_64-linux-gnu/bits/long-double.h \
lib/libc/include/x86_64-linux-gnu/gnu/stubs.h \
lib/libc/include/x86_64-linux-gnu/gnu/stubs-64.h lib/include/stddef.h \
lib/libc/include/generic-glibc/bits/time.h \
lib/libc/include/generic-glibc/bits/types.h \
lib/libc/include/x86_64-linux-gnu/bits/typesizes.h \
lib/libc/include/generic-glibc/bits/time64.h \
lib/libc/include/generic-glibc/bits/types/clock_t.h \
lib/libc/include/generic-glibc/bits/types/time_t.h \
lib/libc/include/generic-glibc/bits/types/struct_tm.h \
lib/libc/include/generic-glibc/bits/types/struct_timespec.h \
lib/libc/include/generic-glibc/bits/endian.h \
lib/libc/include/x86_64-linux-gnu/bits/endianness.h \
lib/libc/include/generic-glibc/bits/types/clockid_t.h \
lib/libc/include/generic-glibc/bits/types/timer_t.h \
lib/libc/include/generic-glibc/bits/types/struct_itimerspec.h \
lib/libc/include/generic-glibc/bits/types/locale_t.h \
lib/libc/include/generic-glibc/bits/types/__locale_t.h
Here you can see that all the files are relative paths.
They are relative when ZIG_LIB_DIR is relative; which is the correct behavior.
Our discussion was making ZIG_LIB_DIR absolute.
A reminder of what we agreed in person:
ZIG_LIB_DIR
is absolute, and we resolve the symlinks before invoking zig cc
(so zig's caching system always knows they are the same files).zig cc -target <...> -M
returns relative paths iff the path to the source file is relative.Does that explain it?
Here is a more detailed explanation of the context we are dealing with: how Bazel manages dependencies and cache and why it matters.
bazel-zig-cc downloads Zig to a path somewhere in $HOME/.cache
:
$ pwd
/home/motiejus/.go-code
$ ls -d $(bazel info output_base)/external/zig_sdk/{zig,lib/libc/musl/libc.S}
/home/motiejus/.cache/bazel/_bazel_motiejus/80f026c00534678eecd7f80fa20fddc4/external/zig_sdk/lib/libc/musl/libc.S
/home/motiejus/.cache/bazel/_bazel_motiejus/80f026c00534678eecd7f80fa20fddc4/external/zig_sdk/zig
$
Hash in the path (80f026...
) is derived from the full path where git
repository is hosted. That is, if I run bazel in ~/.go-code2
, the bazel's
output_base
will be in /home/motiejus/.cache/bazel/<different_hash>/
.
Bazel is designed to control the tools that it uses. Invoking anything outside
of Bazel's output_base
is not OK. For example, if Bazel needs to build
something that requires, say, with gnu make, it will build make
first and
then use it to build other targets.
It is possible, but generally not an option to have anything nontrival outside of Bazel's control.
Bazel has a few caching layers:
Build cache in $HOME/.cache/bazel/...
. This cache is per-workspace
(technically, per output_path
). If you run bazel clean
, that will get
wiped. It is not shared across workspaces.
Remote cache in a local directory, a network share, or a remote service.
... and a couple more in between which we will not discuss here.
bazel-zig-cc has another cache directory: /tmp/bazel-zig-cc
. Before it
invokes zig c++
, it sets the zig's cache directory to that:
export ZIG_LOCAL_CACHE_DIR="{cache_prefix}/bazel-zig-cc"
export ZIG_GLOBAL_CACHE_DIR="{cache_prefix}/bazel-zig-cc"
({cache_prefix}
is set per environment, which is either
~/.cache/bazel-zig-cc
or /tmp/bazel-zig-cc
).
Note: ZIG_(LOCAL|GLOBAL)_CACHE_DIR
is always the same same across different
invocations of zig c++
.
This is a simplified model how Bazel compiles a C file and how it interacts
with remote cache. Before compiling main.c
Bazel does:
$CC -M -MF main.d main.c
Then it constructs a $hash
from:
main.d
: the file paths and their hashes.Then Bazel queries the remote cache for an entry $hash
. If it is a match, it
will download main.o
and skip invoking an expensive compiler. If the entry is
not present, it will compile the file:
$CC -o main.o
And upload the resulting main.o
with the hash key that it has computed in the
previous step. If other users compute the same hash, they will be able to
download the file instead of compiling.
This is done not only for individual object files -- Bazel can cache and download full static libraries composed of thousands of individual object files which take minutes to compile.
Bazel creates as many sandboxes as there are cores. Each sandbox contains a
symlink to all files in zig sdk. Here is an example of sandbox 153
symlinking
to the zig
binary:
$ ls -l $(bazel info output_base)/sandbox/linux-sandbox/153/execroot/__main__/external/zig_sdk/zig
/home/motiejus/.cache/bazel/_bazel_motiejus/80f026c00534678eecd7f80fa20fddc4/sandbox/linux-sandbox/153/execroot/__main__/external/zig_sdk/zig -> /home/motiejus/.cache/bazel/_bazel_motiejus/80f026c00534678eecd7f80fa20fddc4/execroot/__main__/external/zig_sdk/zig
When Bazel is compiling a C file in sandbox 153, zig c++
process is executed
in:
/home/motiejus/.cache/bazel/_bazel_motiejus/80f026c00534678eecd7f80fa20fddc4/sandbox/linux-sandbox/153/execroot/__main__
Layout of every sandbox is always the same, so all of them have
external/zig_sdk
pointing to the same files using the symlins. Bazel then
invokes the binary using the relative path.
bazel-zig-cc also sets ZIG_LIB_DIR=external/zig_sdk/lib
. As a result,
zig c++ -M main.c
returns relative paths to the dependent files, in this
case, libc headers. Since the file contents are the same (zig sdk is always the
same for a particular hash of go-code) and the paths are the same (all
relative), the remote cache hash keys are also the same across different users
who have bazel's cache in different directories. Thus they can use the same
remote cache.
The remote cache is shared by:
Since all dependency paths are used to construct the hash key for the remote cache, all paths have to be the same across different environments.
Since Zig SDK is placed wherever Bazel feels like it, the only option that
comes to my mind is keeping returned by $CC -M
relative. If the paths are
relative, they are considered the same, thus forming the same hash key.
At the same time, zig thinks that paths to zig lib dir are different from every sandbox; so without #13051 it is not reusing global libc artifacts.
Hmm I'm actually not able to reproduce the problem. Here is an attempt where I am getting what I understand to be the desired behavior, based on our discussion:
[nix-shell:~/Downloads/zig/build-release]$ cd a1 [nix-shell:~/Downloads/zig/build-release/a1]$ ls -l lib lrwxrwxrwx 1 andy users 28 Oct 4 23:02 lib -> /home/andy/Downloads/zig/lib [nix-shell:~/Downloads/zig/build-release/a1]$ ZIG_LIB_DIR=lib ../stage4/bin/zig cc -o hello -c hello.c -MD -MV -MF hello.d -target x86_64-linux-gnu.2.33 [nix-shell:~/Downloads/zig/build-release/a1]$ cat hello.d hello: hello.c lib/libc/include/generic-glibc/time.h \ lib/libc/include/generic-glibc/features.h \ lib/libc/include/generic-glibc/features-time64.h \ lib/libc/include/x86_64-linux-gnu/bits/wordsize.h \ lib/libc/include/x86_64-linux-gnu/bits/timesize.h \ lib/libc/include/generic-glibc/stdc-predef.h \ lib/libc/include/generic-glibc/sys/cdefs.h \ lib/libc/include/x86_64-linux-gnu/bits/long-double.h \ lib/libc/include/x86_64-linux-gnu/gnu/stubs.h \ lib/libc/include/x86_64-linux-gnu/gnu/stubs-64.h lib/include/stddef.h \ lib/libc/include/generic-glibc/bits/time.h \ lib/libc/include/generic-glibc/bits/types.h \ lib/libc/include/x86_64-linux-gnu/bits/typesizes.h \ lib/libc/include/generic-glibc/bits/time64.h \ lib/libc/include/generic-glibc/bits/types/clock_t.h \ lib/libc/include/generic-glibc/bits/types/time_t.h \ lib/libc/include/generic-glibc/bits/types/struct_tm.h \ lib/libc/include/generic-glibc/bits/types/struct_timespec.h \ lib/libc/include/generic-glibc/bits/endian.h \ lib/libc/include/x86_64-linux-gnu/bits/endianness.h \ lib/libc/include/generic-glibc/bits/types/clockid_t.h \ lib/libc/include/generic-glibc/bits/types/timer_t.h \ lib/libc/include/generic-glibc/bits/types/struct_itimerspec.h \ lib/libc/include/generic-glibc/bits/types/locale_t.h \ lib/libc/include/generic-glibc/bits/types/__locale_t.h
Here you can see that all the files are relative paths.
You nailed it: the first half works correctly (emitting relative paths). Now for the second half, please execute:
mkdir ../a2; cd ../a2
ln -s /home/andy/Downloads/zig/lib
ZIG_LIB_DIR=lib ZIG_VERBOSE_CC=1 ../stage4/bin/zig cc -o hello -c hello.c -MD -MV -MF hello.d -target x86_64-linux-gnu.2.33
... and observe:
hello.d
are relative (good).I spent a couple of days investigating the intersection of bazel and zig-cc with regards to build performance issues. There is one more aspect to it:
ZIG_LIB_DIR=external/zig_sdk/lib
. This fixes the performance issue with a symlink farm (16k symlinks are replaced with a single mount(2)
, however, as far as Zig is concerned, it's a completely different directory, which messes up global caching. With sandboxfs, even #13051 does not help.To sum up: for Zig to be friendly with Bazel, we need to find a way for zig to understand that different ZIG_LIB_DIR
s (read: different sandboxes) may actually refer to identical directory contents.
Summary of a voice chat that @motiejus and I had: It looks like this problem can be solved by a combination of two things:
This enhancement will benefit the portability of zig because "absolute file paths" are problematic for some systems, such as WASI, and operating systems that do not have realpath.
I will look into this over the next week.
Zig Version
0.10.0-dev.4176+6d7b0690a
Summary & Impact
When using a relative zig lib directory (
ZIG_LIB_DIR=lib
) and when building from different directories, libc shims get rebuilt for every working directory. This is a particularly nasty problem for Bazel, which uses a different directory for each "sandbox" (i.e. execution unit), but relative directory toZIG_LIB_DIR
due to reproducibility. In our case, it causes tens of gigabytes oflibc++.a
in zig cache directory, besides the CPU usage to generate those.Steps to reproduce
Setup:
Directory
a1
:Observe the
zig c++
command takes ~16 seconds, which means it built the glibc shim. Now switch to directorya2
:If we run this again in
a2
, we see the latency is significantly decreased:Expected Behavior
It takes <1 second to run the
a2
from from the first attempt.