llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.84k stars 11.47k forks source link

Treat -march=native as -mcpu=native on AArch64 #98443

Open ktkachov opened 1 month ago

ktkachov commented 1 month ago

Generally speaking, -mcpu=native is the recommended option for native AArch64 Linux users for getting the best CPU and architecture-specific performance out of their code. However, many makefiles out there are geared towards a default x86 environment and use -march=native, which is the recommended option on that target. To help with aarch64 porting for those users and avoid them having to special-case aarch64, we could treat -march=native as -mcpu=native. That is, select not just the architecture features of the host system, but also select the host CPU through the part number and tune for it in codegen. This will keep the contract of -march=native while providing a better performance experience for the users and help with porting. GCC does this since GCC 13 https://gcc.gnu.org/g:dd9e5f4db2debf1429feab7f785962ccef6e0dbd

sjoerdmeijer commented 1 month ago

CC: @pratlucas , @ostannard , @vhscampos

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-aarch64

Author: Kyrill Tkachov (ktkachov)

Generally speaking, -mcpu=native is the recommended option for native AArch64 Linux users for getting the best CPU and architecture-specific performance out of their code. However, many makefiles out there are geared towards a default x86 environment and use -march=native, which is the recommended option on that target. To help with aarch64 porting for those users and avoid them having to special-case aarch64, we could treat -march=native as -mcpu=native. That is, select not just the architecture features of the host system, but also select the host CPU through the part number and tune for it in codegen. This will keep the contract of -march=native while providing a better performance experience for the users and help with porting. GCC does this since GCC 13 https://gcc.gnu.org/g:dd9e5f4db2debf1429feab7f785962ccef6e0dbd
davemgreen commented 1 month ago

@tmatheson-arm @DavidSpickett @jthackray FYI

This sounds like a sensible thing to add to me, if it works the same as -mcpu=native. You could argue that it should work the same way as other -march options where it enables the architecture features, but not the tuning for a specific cpu. Having it work the same way as GCC sounds most sensible to me though.

tmatheson-arm commented 1 month ago

It doesn't look like the GCC documentation was ever updated to reflect this: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

At least, it is more ambiguous than the x86 docs: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

ktkachov commented 1 month ago

Yeah that's fair. It was just an oversight in the documentation. I'll update the GCC documentation separately

DavidSpickett commented 1 month ago

You could argue that it should work the same way as other -march options where it enables the architecture features, but not the tuning for a specific cpu.

I also thought this but the other side is that the CPU's characteristics are part of the "native" architecture, so they should be accounted for I think.

Maybe it'd be awkward for a big/little system where the extensions match but the lower level details don't? Though I think you'd be better off spelling out -march=armX.Y-a+a+b+c to get exactly what you want anyway. It's not really "native" if you're going to share it between different cores, even if they are superficially the same.

From the GCC doc for x86:

This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine. Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines). Using -mtune=native produces code optimized for the local machine under the constraints of the selected instruction set.

This says that -march=native enables only extensions, and -mtune=native tunes code generation. Is that actually what was implemented? Seems like AArch64 might have diverged if so.

ktkachov commented 1 month ago

-march and -mcpu have different meanings on x86 than aarch64 anyway (-march is recommended on x86 and -mcpu is recommended on aarch64). The GCC documentation for aarch64 -march=native is updated to be more explicit about treat -march=native as -mcpu=native now: https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-march