rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.14k stars 12.69k forks source link

On Apple M1, passing `-Ctarget-cpu=native` results in us choosing an older CPU than we choose by default. #93889

Closed thomcc closed 2 years ago

thomcc commented 2 years ago

On aarch64-apple-darwin, we don't seem to be determining the right target features when -Ctarget-cpu=native is enabled -- we do a much better job without the -Ctarget-cpu flag than with it.

For example:

The output of rustc --print cfg -Ctarget-cpu=native ``` debug_assertions panic="unwind" target_abi="" target_arch="aarch64" target_endian="little" target_env="" target_family="unix" target_feature="aes" target_feature="fp" target_feature="neon" target_feature="pmuv3" target_feature="sha2" target_has_atomic="128" target_has_atomic="16" target_has_atomic="32" target_has_atomic="64" target_has_atomic="8" target_has_atomic="ptr" target_has_atomic_equal_alignment="128" target_has_atomic_equal_alignment="16" target_has_atomic_equal_alignment="32" target_has_atomic_equal_alignment="64" target_has_atomic_equal_alignment="8" target_has_atomic_equal_alignment="ptr" target_has_atomic_load_store="128" target_has_atomic_load_store="16" target_has_atomic_load_store="32" target_has_atomic_load_store="64" target_has_atomic_load_store="8" target_has_atomic_load_store="ptr" target_os="macos" target_pointer_width="64" target_thread_local target_vendor="apple" unix ```
The output of rustc --print cfg ``` debug_assertions panic="unwind" target_abi="" target_arch="aarch64" target_endian="little" target_env="" target_family="unix" target_feature="aes" target_feature="crc" target_feature="dit" target_feature="dotprod" target_feature="dpb" target_feature="dpb2" target_feature="fcma" target_feature="fhm" target_feature="flagm" target_feature="fp" target_feature="fp16" target_feature="frintts" target_feature="jsconv" target_feature="lor" target_feature="lse" target_feature="neon" target_feature="pan" target_feature="pauth" target_feature="pmuv3" target_feature="ras" target_feature="rcpc" target_feature="rcpc2" target_feature="rdm" target_feature="sb" target_feature="sha2" target_feature="sha3" target_feature="ssbs" target_feature="v8.1a" target_feature="v8.2a" target_feature="v8.3a" target_feature="v8.4a" target_feature="vh" target_has_atomic="128" target_has_atomic="16" target_has_atomic="32" target_has_atomic="64" target_has_atomic="8" target_has_atomic="ptr" target_has_atomic_equal_alignment="128" target_has_atomic_equal_alignment="16" target_has_atomic_equal_alignment="32" target_has_atomic_equal_alignment="64" target_has_atomic_equal_alignment="8" target_has_atomic_equal_alignment="ptr" target_has_atomic_load_store="128" target_has_atomic_load_store="16" target_has_atomic_load_store="32" target_has_atomic_load_store="64" target_has_atomic_load_store="8" target_has_atomic_load_store="ptr" target_os="macos" target_pointer_width="64" target_thread_local target_vendor="apple" unix ```

This is unexpected, and somewhat undesirable -- Ideally specifying -Ctarget-cpu=native would never reduce the set of target features compared to the default, and would, in fact, increase it.

Some digging (mostly by @bjorn3 and @ehuss) determined that LLVM is ending up with cyclone as the CPU under -Ctarget-cpu=native and apple-a14 as the CPU if nothing is passed. apple-a14 itself is actually slightly wrong, even for baseline aarch64-apple-darwin, just the right value of apple-m1 was not available in LLVM until LLVM v13.

Why is LLVM choosing cyclone? Dunno (perhaps because it's an early iOS aarch64 chip), but we probably can do a better job there, given that we do so for the default target -- and I think we can improve aarch64-apple-darwin (with or without -Ctarget-cpu=native) to use apple-m1 -- perhaps only after an LLVM version check.


Open question: Is there a way for us to implement -Ctarget-cpu=native conveniently? That is, to determine which CPU is native? Again, dunno! Running sysctl hw.cpufamily tells you... something... it comes from these, I suppose (well, I hope it's also in some more public documentation, I didn't look).

Can that be mapped to a name like apple-m1? Probably, although seems slightly annoying, and there may be a better way... Also, they suggest that you "should not" do this in the comment above these defines, but... they also have the names of the Intel chips in there too, which obviously works just fine... so perhaps the "should not" in the sentence is in the RFC sense (and I can see why they might want to encourage feature checking anyway, even if checking the CPU worked fine)

Regardless, I think that we don't actually need to fully fix -Ctarget-cpu=native in order to improve things here -- it'd be nice, but in the meantime perhaps it's sufficient for us to ensure that it doesn't yield a CPU older than the one we use by default (and as I mentioned, it might be worth bumping the default target-cpu for aarch64-apple-darwin to apple-m1 when on LLVM 13 while we're there).


(Zulip discussion leading to filing this issue: https://rust-lang.zulipchat.com/#narrow/stream/242906-t-compiler.2Farm/topic/aarch64-apple-darwin.20target.20feats.20w.2F.20.60-Ctarget-cpu.3Dnative.60 -- I've attempted to cover the important points above)

nagisa commented 2 years ago

in the meantime perhaps it's sufficient for us to ensure that it doesn't yield a CPU older than the one we use by default

People could run macOS on A12Z (transition kit) or in a VM (even though it is not blessed by apple) where specific CPU feature sets are disabled. -Ctarget-cpu=native should always generate for a baseline of features that will work on the host, no matter what.

The exact CPU instruction scheduling picked… is less of a concern – the code will still run regardless of the ordering of the instructions. I'd be okay with some capping/defaulting here, but I'm not sure how rustc would be able to tell that M1 is more recent than Cyclone or some other arbitrary string, at least not without maintaining a table of these names. And I feel like we ideally don't want to maintain any such tables.

thomcc commented 2 years ago

Hm, fair enough. Perhaps checking based on sysctl -a hw.optional (which will list the instruction set extensions on the current machine) would be the best option then?

nagisa commented 2 years ago

Will get fixed by https://github.com/llvm/llvm-project/commit/fcca10c69aaab539962d10fcc59a5f074b73b0de

workingjubilee commented 2 years ago

but I'm not sure how rustc would be able to tell that M1 is more recent than Cyclone or some other arbitrary string, at least not without maintaining a table of these names

Given that the M1 parallels the A14 Bionic, it's unfortunate that the string selected was "cyclone" given that it's the A7. Otherwise the obvious comparisons would fall out.

thomcc commented 2 years ago

This works on nightly, closing.

$ rustc --print cfg -Ctarget-cpu=native
debug_assertions
panic="unwind"
target_abi=""
target_arch="aarch64"
target_endian="little"
target_env=""
target_family="unix"
target_feature="aes"
target_feature="crc"
target_feature="dit"
target_feature="dotprod"
target_feature="dpb"
target_feature="dpb2"
target_feature="fcma"
target_feature="fhm"
target_feature="flagm"
target_feature="fp16"
target_feature="frintts"
target_feature="jsconv"
target_feature="llvm14-builtins-abi"
target_feature="lor"
target_feature="lse"
target_feature="neon"
target_feature="paca"
target_feature="pacg"
target_feature="pan"
target_feature="pmuv3"
target_feature="ras"
target_feature="rcpc"
target_feature="rcpc2"
target_feature="rdm"
target_feature="sb"
target_feature="sha2"
target_feature="sha3"
target_feature="ssbs"
target_feature="v8.1a"
target_feature="v8.2a"
target_feature="v8.3a"
target_feature="v8.4a"
target_feature="vh"
target_has_atomic="128"
target_has_atomic="16"
target_has_atomic="32"
target_has_atomic="64"
target_has_atomic="8"
target_has_atomic="ptr"
target_has_atomic_equal_alignment="128"
target_has_atomic_equal_alignment="16"
target_has_atomic_equal_alignment="32"
target_has_atomic_equal_alignment="64"
target_has_atomic_equal_alignment="8"
target_has_atomic_equal_alignment="ptr"
target_has_atomic_load_store="128"
target_has_atomic_load_store="16"
target_has_atomic_load_store="32"
target_has_atomic_load_store="64"
target_has_atomic_load_store="8"
target_has_atomic_load_store="ptr"
target_os="macos"
target_pointer_width="64"
target_thread_local
target_vendor="apple"
unix
elichai commented 1 month ago

There's still either some weird thing going on or I misunderstood the target cpu names, running on M3 Pro:

diff <(rustc --print cfg -Ctarget-cpu=native) <(rustc --print cfg -Ctarget-cpu=apple-m3)
8a9,10
> target_feature="bf16"
> target_feature="bti"
18a21
> target_feature="i8mm"

You can see that bf16, bti and i8m are available on apple-m3 but not on native .

Interestingly It seems apple-m3 is the default for unknown CPUs: https://github.com/llvm/llvm-project/blob/4c51d827e58aaa8c5b3d75b3b61a43627ab53491/llvm/lib/TargetParser/Host.cpp#L1541 And checking on my machine, the cpufamily is not even in the list (0x5f4dea93)