Closed thomcc closed 2 years ago
in the meantime perhaps it's sufficient for us to ensure that it doesn't yield a CPU older than the one we use by default
People could run macOS on A12Z (transition kit) or in a VM (even though it is not blessed by apple) where specific CPU feature sets are disabled. -Ctarget-cpu=native
should always generate for a baseline of features that will work on the host, no matter what.
The exact CPU instruction scheduling picked… is less of a concern – the code will still run regardless of the ordering of the instructions. I'd be okay with some capping/defaulting here, but I'm not sure how rustc would be able to tell that M1 is more recent than Cyclone or some other arbitrary string, at least not without maintaining a table of these names. And I feel like we ideally don't want to maintain any such tables.
Hm, fair enough. Perhaps checking based on sysctl -a hw.optional
(which will list the instruction set extensions on the current machine) would be the best option then?
but I'm not sure how rustc would be able to tell that M1 is more recent than Cyclone or some other arbitrary string, at least not without maintaining a table of these names
Given that the M1 parallels the A14 Bionic, it's unfortunate that the string selected was "cyclone"
given that it's the A7. Otherwise the obvious comparisons would fall out.
This works on nightly, closing.
$ rustc --print cfg -Ctarget-cpu=native
debug_assertions
panic="unwind"
target_abi=""
target_arch="aarch64"
target_endian="little"
target_env=""
target_family="unix"
target_feature="aes"
target_feature="crc"
target_feature="dit"
target_feature="dotprod"
target_feature="dpb"
target_feature="dpb2"
target_feature="fcma"
target_feature="fhm"
target_feature="flagm"
target_feature="fp16"
target_feature="frintts"
target_feature="jsconv"
target_feature="llvm14-builtins-abi"
target_feature="lor"
target_feature="lse"
target_feature="neon"
target_feature="paca"
target_feature="pacg"
target_feature="pan"
target_feature="pmuv3"
target_feature="ras"
target_feature="rcpc"
target_feature="rcpc2"
target_feature="rdm"
target_feature="sb"
target_feature="sha2"
target_feature="sha3"
target_feature="ssbs"
target_feature="v8.1a"
target_feature="v8.2a"
target_feature="v8.3a"
target_feature="v8.4a"
target_feature="vh"
target_has_atomic="128"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="ptr"
target_has_atomic_equal_alignment="128"
target_has_atomic_equal_alignment="16"
target_has_atomic_equal_alignment="32"
target_has_atomic_equal_alignment="64"
target_has_atomic_equal_alignment="8"
target_has_atomic_equal_alignment="ptr"
target_has_atomic_load_store="128"
target_has_atomic_load_store="16"
target_has_atomic_load_store="32"
target_has_atomic_load_store="64"
target_has_atomic_load_store="8"
target_has_atomic_load_store="ptr"
target_os="macos"
target_pointer_width="64"
target_thread_local
target_vendor="apple"
unix
There's still either some weird thing going on or I misunderstood the target cpu names, running on M3 Pro:
diff <(rustc --print cfg -Ctarget-cpu=native) <(rustc --print cfg -Ctarget-cpu=apple-m3)
8a9,10
> target_feature="bf16"
> target_feature="bti"
18a21
> target_feature="i8mm"
You can see that bf16
, bti
and i8m
are available on apple-m3
but not on native
.
Interestingly It seems apple-m3
is the default for unknown CPUs: https://github.com/llvm/llvm-project/blob/4c51d827e58aaa8c5b3d75b3b61a43627ab53491/llvm/lib/TargetParser/Host.cpp#L1541
And checking on my machine, the cpufamily
is not even in the list (0x5f4dea93
)
On
aarch64-apple-darwin
, we don't seem to be determining the right target features when-Ctarget-cpu=native
is enabled -- we do a much better job without the-Ctarget-cpu
flag than with it.For example:
The output of
``` debug_assertions panic="unwind" target_abi="" target_arch="aarch64" target_endian="little" target_env="" target_family="unix" target_feature="aes" target_feature="fp" target_feature="neon" target_feature="pmuv3" target_feature="sha2" target_has_atomic="128" target_has_atomic="16" target_has_atomic="32" target_has_atomic="64" target_has_atomic="8" target_has_atomic="ptr" target_has_atomic_equal_alignment="128" target_has_atomic_equal_alignment="16" target_has_atomic_equal_alignment="32" target_has_atomic_equal_alignment="64" target_has_atomic_equal_alignment="8" target_has_atomic_equal_alignment="ptr" target_has_atomic_load_store="128" target_has_atomic_load_store="16" target_has_atomic_load_store="32" target_has_atomic_load_store="64" target_has_atomic_load_store="8" target_has_atomic_load_store="ptr" target_os="macos" target_pointer_width="64" target_thread_local target_vendor="apple" unix ```rustc --print cfg -Ctarget-cpu=native
The output of
``` debug_assertions panic="unwind" target_abi="" target_arch="aarch64" target_endian="little" target_env="" target_family="unix" target_feature="aes" target_feature="crc" target_feature="dit" target_feature="dotprod" target_feature="dpb" target_feature="dpb2" target_feature="fcma" target_feature="fhm" target_feature="flagm" target_feature="fp" target_feature="fp16" target_feature="frintts" target_feature="jsconv" target_feature="lor" target_feature="lse" target_feature="neon" target_feature="pan" target_feature="pauth" target_feature="pmuv3" target_feature="ras" target_feature="rcpc" target_feature="rcpc2" target_feature="rdm" target_feature="sb" target_feature="sha2" target_feature="sha3" target_feature="ssbs" target_feature="v8.1a" target_feature="v8.2a" target_feature="v8.3a" target_feature="v8.4a" target_feature="vh" target_has_atomic="128" target_has_atomic="16" target_has_atomic="32" target_has_atomic="64" target_has_atomic="8" target_has_atomic="ptr" target_has_atomic_equal_alignment="128" target_has_atomic_equal_alignment="16" target_has_atomic_equal_alignment="32" target_has_atomic_equal_alignment="64" target_has_atomic_equal_alignment="8" target_has_atomic_equal_alignment="ptr" target_has_atomic_load_store="128" target_has_atomic_load_store="16" target_has_atomic_load_store="32" target_has_atomic_load_store="64" target_has_atomic_load_store="8" target_has_atomic_load_store="ptr" target_os="macos" target_pointer_width="64" target_thread_local target_vendor="apple" unix ```rustc --print cfg
This is unexpected, and somewhat undesirable -- Ideally specifying
-Ctarget-cpu=native
would never reduce the set of target features compared to the default, and would, in fact, increase it.Some digging (mostly by @bjorn3 and @ehuss) determined that LLVM is ending up with
cyclone
as the CPU under-Ctarget-cpu=native
andapple-a14
as the CPU if nothing is passed.apple-a14
itself is actually slightly wrong, even for baselineaarch64-apple-darwin
, just the right value ofapple-m1
was not available in LLVM until LLVM v13.Why is LLVM choosing cyclone? Dunno (perhaps because it's an early iOS aarch64 chip), but we probably can do a better job there, given that we do so for the default target -- and I think we can improve
aarch64-apple-darwin
(with or without-Ctarget-cpu=native
) to useapple-m1
-- perhaps only after an LLVM version check.Open question: Is there a way for us to implement
-Ctarget-cpu=native
conveniently? That is, to determine which CPU is native? Again, dunno! Runningsysctl hw.cpufamily
tells you... something... it comes from these, I suppose (well, I hope it's also in some more public documentation, I didn't look).Can that be mapped to a name like
apple-m1
? Probably, although seems slightly annoying, and there may be a better way... Also, they suggest that you "should not" do this in the comment above these defines, but... they also have the names of the Intel chips in there too, which obviously works just fine... so perhaps the "should not" in the sentence is in the RFC sense (and I can see why they might want to encourage feature checking anyway, even if checking the CPU worked fine)Regardless, I think that we don't actually need to fully fix
-Ctarget-cpu=native
in order to improve things here -- it'd be nice, but in the meantime perhaps it's sufficient for us to ensure that it doesn't yield a CPU older than the one we use by default (and as I mentioned, it might be worth bumping the default target-cpu foraarch64-apple-darwin
toapple-m1
when on LLVM 13 while we're there).(Zulip discussion leading to filing this issue: https://rust-lang.zulipchat.com/#narrow/stream/242906-t-compiler.2Farm/topic/aarch64-apple-darwin.20target.20feats.20w.2F.20.60-Ctarget-cpu.3Dnative.60 -- I've attempted to cover the important points above)