Open blaztinn opened 7 months ago
TL;DR: Add -arch x86_64
to your CFLAGS instead of running the entire build as x86_64. There are too many things that can go wrong with the way you're doing it (plus, it's slower to build the way you do).
Full version: sccache can only really work with what it can know. What it can know is:
arch -x86_64 foo
will run foo a x86_64 binary... if it is one. But as you're saying, sccache is arm64-only, so in practice what sccache clang <flags>
is doing is running as arm64. But what happens next also depends on what happened before:arch -x86_64
does, which I'm not too sure:.arch -x86_64
still applies to subprocesses when the intermediate process is arm64 (which your final note seems to indicate would be the case), then:
arch -x86_64
when the server was spawned. So if you start with sccache clang
and then do arch -x86_64 sccache clang
, the second time it's still going to be running as arm64.arch -x86_64
doesn't propagate to subprocesses, then clang will always run as arm64.
We're probably in the first situation, so the first thing is that what you get with your arch -x86_64 make
command depends on whether sccache is already running or not. The first thing you can try is to run sccache --stop-server
in between builds. Another thing you can try to validate which case you're actually in, is to set SCCACHE_RECACHE to 1 in your environment, which would disable caching entirely and rebuild files every time, validating whether the build would produce x86_64 objects ever.
But then, that's still not the full story. One of the sources of the cache key is the preprocessed source. In many cases, the preprocessed source will likely be identical whether you're building for x86_64 or for arm64. If you add something like:
#ifdef __x86_64__
const int foo = 1;
#else
const int foo = 2;
#endif
to your source files, I'd expect things to somehow work, if we're indeed in the case where clang may actually run either as x86_64 or as arm64, but you'd still be dependent on whether sccache itself was started first from arch -x86_64
or not.
However, the more usual way to build something for x86_64 mac on an arm64 mac is not to run the entire build as x86_64, but to pass arguments to the compiler. Either --target=x86_64-apple-darwin
or -arch x86_64
. And in this case, that would mean different command line arguments, and different cache keys. It would also be faster, because your compiler itself wouldn't be emulated.
Thank you for the detailed answer, I really appreciate it!
I will use the native compiler and set the target arch or triple, thank you for pointing that out. This solves my issue and will hopefully also make the compiling faster (because of no emulation as you said).
But I still think this is a bug in sccache. Because sccache clang <args>
should always produce the same output as running clang <args>
directly.
I think the problem is in sccache using the whole fat binary of clang
to detect the compiler binary while actually only the specific arch of that binary is ever used to begin with. This is why I think that sccache should hash both the binary and the target arch that is being used.
Luckily the bug only shows when mixing the usage of sccache
from both arm64 and x86_64 environments, so it is hard to trigger in practice.
Side note: I wonder also if clang has a hardcoded default target triple or does it depend on the environment it is run in (ignoring the flags passed to the compiler)?
To show that arch -x86_64
propagates to subproccesses I've added next lines to main()
in sccache
:
{
let output =
Command::new("uname")
.arg("-m")
.output()
.expect("failed to execute process")
.stdout;
println!("uname -m: {0}", String::from_utf8(output).unwrap());
}
{
let output =
Command::new("arch")
.output()
.expect("failed to execute process")
.stdout;
println!("arch: {0}", String::from_utf8(output).unwrap());
}
{
let output =
Command::new("clang")
.arg("-dumpmachine")
.output()
.expect("failed to execute process")
.stdout;
println!("clang -dumpmachine: {0}", String::from_utf8(output).unwrap());
}
Running ./sccache -V
through make
I get:
blaz.tomazic@Blazs-MacBook-Pro sccache-example % file sccache
sccache: Mach-O 64-bit executable arm64
blaz.tomazic@Blazs-MacBook-Pro sccache-example % make version-sccache
./sccache -V
uname -m: arm64
arch: arm64
clang -dumpmachine: arm64-apple-darwin23.3.0
sccache 0.7.7
blaz.tomazic@Blazs-MacBook-Pro sccache-example % arch -x86_64 make version-sccache
./sccache -V
uname -m: x86_64
arch: i386
clang -dumpmachine: x86_64-apple-darwin23.3.0
sccache 0.7.7
I don't know how OS decides which arch of a binary to spawn in a subprocess. Dumping sysctl
values and env vars didn't show any differences. Also sysctl.proc_translated
was 0
inside sccache (https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment#Determine-Whether-Your-App-Is-Running-as-a-Translated-Binary). I guess it should be since we are running an arm64 sccache anyway.
I did notice the crossarch_trap
in dtrace though, probably this is how OS selects the subprocess arch:
blaz.tomazic@Blazs-MacBook-Pro sccache-example % sudo dtruss -a ./sccache
PID/THRD RELATIVE ELAPSD CPU SYSCALL(args) = return
6034/0x13019: 142: 0: 0 fork() = 0 0
6034/0x13019: 121 29 18446744073709530 munmap(0x109A18000, 0x98000) = 0 0
6034/0x13019: 123 2 1 munmap(0x109AB0000, 0x8000) = 0 0
6034/0x13019: 125 1 1 munmap(0x109AB8000, 0x4000) = 0 0
6034/0x13019: 126 1 1 munmap(0x109ABC000, 0x4000) = 0 0
6034/0x13019: 127 3 0 munmap(0x109AC0000, 0x5C000) = 0 0
6034/0x13019: 141 1 0 crossarch_trap(0x0, 0x0, 0x0) = -1 Err#45
6034/0x13019: 154 18 11 open(".\0", 0x100000, 0x0) = 3 0
6034/0x13019: 157 2 1 fcntl(0x3, 0x32, 0x16B1232A8) = 0 0
6034/0x13019: 158 2 1 close(0x3) = 0 0
6034/0x13019: 163 6 3 fsgetpath(0x16B1232B8, 0x400, 0x16B123298) = 53 0
6034/0x13019: 176 13 12 fsgetpath(0x16B1232C8, 0x400, 0x16B1232A8) = 14 0
6034/0x13019: 180 1 18446744073709550 csrctl(0x0, 0x16B1236CC, 0x4) = -1 Err#1
6034/0x13019: 184 4 2 __mac_syscall(0x188A12948, 0x2, 0x16B123610) = 0 0
6034/0x13019: 185 0 0 csrctl(0x0, 0x16B1236EC, 0x4) = -1 Err#1
6034/0x13019: 188 3 2 __mac_syscall(0x188A0F79E, 0x5A, 0x16B123680) = 0 0
6034/0x13019: 196 6 0 sysctl([unknown, 3, 0, 0, 0, 0] (2), 0x16B122BE8, 0x16B122BE0, 0x188A113EF, 0xD) = 0 0
6034/0x13019: 197 1 0 sysctl([CTL_KERN, 137, 0, 0, 0, 0] (2), 0x16B122C98, 0x16B122C90, 0x0, 0x0) = 0 0
6034/0x13019: 212 10 10 open("/\0", 0x20100000, 0x0) = 3 0
6034/0x13019: 221 20 7 openat(0x3, "System/Cryptexes/OS\0", 0x100000, 0x0) = 4 0
6034/0x13019: 222 1 0 dup(0x4, 0x0, 0x0) = 5 0
6034/0x13019: 227 5 4 fstatat64(0x4, 0x16B122771, 0x16B1226E0) = 0 0
6034/0x13019: 232 9 4 openat(0x4, "System/Library/dyld/\0", 0x100000, 0x0) = 6 0
6034/0x13019: 233 24 0 fcntl(0x6, 0x32, 0x16B122770) = 0 0
6034/0x13019: 234 0 0 dup(0x6, 0x0, 0x0) = 7 0
6034/0x13019: 234 0 0 dup(0x5, 0x0, 0x0) = 8 0
6034/0x13019: 235 1 0 close(0x3) = 0 0
6034/0x13019: 235 0 0 close(0x5) = 0 0
6034/0x13019: 235 0 0 close(0x4) = 0 0
6034/0x13019: 236 0 0 close(0x6) = 0 0
6034/0x13019: 238 2 1 shared_region_check_np(0x16B122D80, 0x0, 0x0) = 0 0
6034/0x13019: 241 2 2 fsgetpath(0x16B1232D0, 0x400, 0x16B123228) = 82 0
6034/0x13019: 242 0 0 fcntl(0x8, 0x32, 0x16B1232D0) = 0 0
6034/0x13019: 242 0 0 close(0x8) = 0 0
6034/0x13019: 241 0 0 close(0x7) = 0 0
6034/0x13019: 248 3 2 getfsstat64(0x0, 0x0, 0x2) = 11 0
6034/0x13019: 255 7 7 getfsstat64(0x109C8EAA0, 0x5D28, 0x2) = 11 0
6034/0x13019: 260 5 4 getattrlist("/\0", 0x16B123200, 0x16B123170) = 0 0
6034/0x13019: 267 7 3 stat64("/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_arm64e\0", 0x16B123560, 0x0) = 0 0
uname -m: arm64
arch: arm64
clang -dumpmachine: arm64-apple-darwin23.3.0
But I have no clue how to detect that aside from running a subprocess (like clang -dumpmachine
) and checking the output.
I've also modified the makefile to show that the first compilation is always reused:
clean-cache:
./sccache --stop-server || true
rm -rf /Users/blaz.tomazic/Library/Caches/Mozilla.sccache
version:
clang --version
./sccache -- clang --version
build:
clang -c main.c -o main-clang
./sccache -- clang -c main.c -o main-sccache
file main-*
The output from commands is:
blaz.tomazic@Blazs-MacBook-Pro sccache-example % make clean-cache
./sccache --stop-server || true
Stopping sccache server...
sccache: error: couldn't connect to server
sccache: caused by: Connection refused (os error 61)
rm -rf /Users/blaz.tomazic/Library/Caches/Mozilla.sccache
blaz.tomazic@Blazs-MacBook-Pro sccache-example % make build
clang -c main.c -o main-clang
./sccache -- clang -c main.c -o main-sccache
file main-*
main-clang: Mach-O 64-bit object arm64
main-sccache: Mach-O 64-bit object arm64
blaz.tomazic@Blazs-MacBook-Pro sccache-example % arch -x86_64 make build
clang -c main.c -o main-clang
./sccache -- clang -c main.c -o main-sccache
file main-*
main-clang: Mach-O 64-bit object x86_64
main-sccache: Mach-O 64-bit object arm64
blaz.tomazic@Blazs-MacBook-Pro sccache-example % make clean-cache
./sccache --stop-server || true
Stopping sccache server...
Compile requests 2
Compile requests executed 2
Cache hits 1
Cache hits (C/C++) 1
Cache misses 1
Cache misses (C/C++) 1
Cache timeouts 0
Cache read errors 0
Forced recaches 0
Cache write errors 0
Compilation failures 0
Cache errors 0
Non-cacheable compilations 0
Non-cacheable calls 0
Non-compilation calls 0
Unsupported compiler calls 0
Average cache write 0.000 s
Average compiler 0.019 s
Average cache read hit 0.000 s
Failed distributed compilations 0
Cache location Local disk: "/Users/blaz.tomazic/Library/Caches/Mozilla.sccache"
Use direct/preprocessor mode? yes
Version (client) 0.7.7
Cache size 332 bytes
Max cache size 10 GiB
rm -rf /Users/blaz.tomazic/Library/Caches/Mozilla.sccache
blaz.tomazic@Blazs-MacBook-Pro sccache-example % arch -x86_64 make build
clang -c main.c -o main-clang
./sccache -- clang -c main.c -o main-sccache
file main-*
main-clang: Mach-O 64-bit object x86_64
main-sccache: Mach-O 64-bit object x86_64
blaz.tomazic@Blazs-MacBook-Pro sccache-example % make build
clang -c main.c -o main-clang
./sccache -- clang -c main.c -o main-sccache
file main-*
main-clang: Mach-O 64-bit object arm64
main-sccache: Mach-O 64-bit object x86_64
clang
andmake
on macOS are fat binaries with both having arm64 and x86_64 parts. You can select which architecture you want to run witharch
command and here is wheresccache
fails to detect the right target architecture when caching compiled artifacts. It always returns the artifacts for the architecture you compiled with first (probably because of not hashing the (right) target architecture?).A quick way to reproduce this on arm64 Apple HW:
main.c
(int main(){}
)make build
:arch -x86_64 make build
:As you can see the
sccache
cached the arm64 binary on the first compilation and returned it when running the x86_64 compiler targeting the x86_64 binary. On the other hand if we first compiled - with an empty cache - for x86_64 we would always get an x86_64 binary.This example needs the
make
command becausesccache
I downloaded is an arm64-only binary and directly callingarch -x86_64 sccache
fails.Note: Having
sccache -- clang --version
inside the makefile and running the makefile witharch -x86_64 make
reports that clang is an x86_64 binary, so the arm64-only sccache binary forwards commands to the right compiler arch.