official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.6k stars 2.28k forks source link

Add proper build system which automatically detects architecture (or at least add aarch64 support) #2355

Closed ThanosApostolou closed 4 years ago

ThanosApostolou commented 5 years ago

It's a little bit messy to build stockfish for multiple architectures by passing the ARCH variable. Is adding a build system like cmake or meson something that you would consider?

Also there is currently not an architecture option for aarch64 and it's not clear from the makefile how I can manually override this easily. So, I when I included stockfish as a module for gnome-chess at flathub I chose to use armv7 even for the aarch64 architecture as you can see here: https://github.com/flathub/org.gnome.Chess/pull/14/commits/01ff82e4afccbca7bdbcda647a3cbdd6e78eedcf Maybe you can add an aarch64 option too at the makefile, until you consider the decision on the build system?

snicolet commented 4 years ago

Adding an aarch64 option in the makefile: sure, we could consider this. However, the experience of aarch64 in the SF team is quite limited, so maybe you would be the best person to propose a patch as you obviously have access to such a machine?

About cmake or meson: probably not an option, because each build tool we add needs extra maintenance, and we prefer to keep SF toolchain as simple as possible.

hgy59 commented 4 years ago

Please add aarch64 (ARM64) support to src/Makefile as this is the prefered ARM arch for Synology's Diskstation (NAS) and supported by Raspberry Pi 3 and 4.

vondele commented 4 years ago

I assume you have access ? In that case, suggest patches (as a pull request or as a diff), indicating how they have been tested.

bftjoe commented 4 years ago

There already is cmake/Visual Studio support but it's autogenerated by appveyor.yml

I extracted it and shortened it here: https://github.com/bftjoe/Stockfish/blob/master/CMakeLists.txt

Not sure how it's more maintenance to include CMakeLists.txt in the repo instead of autogenerating it...

abdulbadii commented 4 years ago

try its options prefixed by CXXFLAGS= in place of ARC= COMP=. guided by GCC reference: AArch64 Options. These options are defined for AArch64 implementations:

-mabi=name Generate code for the specifi ed data model. Permissible values are‘ ilp32’ for SysV-like data model where int, long int and pointers are 32 bits, and ‘lp64’ for SysV-like data model where int is 32 bits, but long int and pointers are 64 bits. The default depends on the specific target configuration. Note that the LP64 and ILP32 ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries. -mbig-endian Generate big-endian code. This is the default when GCC is configured for an ‘aarch64_be--’ target. -mgeneral-regs-only Generate code which uses only the general-purpose registers. This will prevent the compiler from using floating-point and Advanced SIMD registers but will not impose any restrictions on the assembler. -mlittle-endian Generate little-endian code. This is the default when GCC is configured for an ‘aarch64--’ but not an ‘aarch64_be--’ target. -mcmodel=tiny Generate code for the tiny code model. The program and its statically defined symbols must be within 1MB of each other. Programs can be statically or dynamically linked. -mcmodel=small Generate code for the small code model. The program and its statically defined symbols must be within 4GB of each other. Programs can be statically or dynamically linked. This is the default code model. -mcmodel=large Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Programs can be statically linked only. -mstrict-align -mno-strict-align Avoid or allow generating memory accesses that may not be aligned on a natural object boundary as described in the architecture specification. -momit-leaf-frame-pointer -mno-omit-leaf-frame-pointer Omit or keep the frame pointer in leaf functions. The former behavior is the default. -mstack-protector-guard=guard -mstack-protector-guard-reg=reg -mstack-protector-guard-offset=offset Generate stack protection code using canary at guard. Supported locations are ‘global’ for a global canary or ‘sysreg’ for a canary in an appropriate system register. With the latter choice the options ‘-mstack-protector-guard-reg=reg’ and ‘-mstack-protector-guard-offset=offset’ furthermore specify which system register to use as base register for reading the canary, and from what offset from that base register. There is no default register or off set as this is entirely for use within the Linux kernel. -mstack-protector-guard=guard -mstack-protector-guard-reg=reg -mstack-protector-guard-offset=offset Generate stack protection code using canary at guard. Supported locations are ‘global’ for a global canary or ‘sysreg’ for a canary in an appropriate system register.

With the latter choice the options ‘-mstack-protector-guard-reg=reg’ and ‘-mstack-protector-guard-offset=offset’ furthermore specify which system register to use as base register for reading the canary, and from what off set from that base register. There is no default register or off set as this is entirely for use within the Linux kernel. -mtls-dialect=desc Use TLS descriptors as the thread-local storage mechanism for dynamic accesses of TLS variables. This is the default. -mtls-dialect=traditional Use traditional TLS as the thread-local storage mechanism for dynamic accesses of TLS variables. -mtls-size=size Specify bit size of immediate TLS off sets. Valid values are 12, 24, 32, 48. This option requires binutils 2.26 or newer. -mfix-cortex-a53-835769 -mno-fix-cortex-a53-835769 Enable or disable the workaround for the ARM Cortex-A53 erratum number

  1. This involves inserting a NOP instruction between memory instructions and 64-bit integer multiply-accumulate instructions. -mfix-cortex-a53-843419 -mno-fix-cortex-a53-843419 Enable or disable the workaround for the ARM Cortex-A53 erratum number
  2. This erratum workaround is made at link time and this will only pass the corresponding flag to the linker. -mlow-precision-recip-sqrt -mno-low-precision-recip-sqrt Enable or disable the reciprocal square root approximation. This option only has an eff ect if‘ -ffast-math’ or ‘-funsafe-math-optimizations’ is used as well. Enabling this reduces precision of reciprocal square root results to about 16 bits for single precision and to 32 bits for double precision. -mlow-precision-sqrt -mno-low-precision-sqrt Enable or disable the square root approximation. This option only has an effect if ‘ -ffast-math’ or ‘-funsafe-math-optimizations’ is used as well. Enabling this reduces precision of square root results to about 16 bits for single precision and to 32 bits for double precision. If enabled, it implies ‘-mlow-precision-recip-sqrt’. -mlow-precision-div -mno-low-precision-div Enable or disable the division approximation. This option only has an eff ect if ‘-ffast-math’ or ‘-funsafe-math-optimizations’ is used as well. Enabling this reduces precision of division results to about 16 bits for single precision and to 32 bits for double precision.

-mtrack-speculation -mno-track-speculation Enable or disable generation of additional code to track speculative execution through conditional branches. The tracking state can then be used by the com- piler when expanding calls to __builtin_speculation_safe_copy to permit a more efficient code sequence to be generated. -moutline-atomics -mno-outline-atomics Enable or disable calls to out-of-line helpers to implement atomic operations. These helpers will, at runtime, determine if the LSE instructions from ARMv8.1-A can be used; if not, they will use the load/store-exclusive instructions that are present in the base ARMv8.0 ISA. This option is only applicable when compiling for the base ARMv8.0 instruction set. If using a later revision, e.g. ‘-march=armv8.1-a’ or ‘-march=armv8-a+lse’, the ARMv8.1-Atomics instructions will be used directly. The same applies when using ‘-mcpu=’ when the selected cpu supports the ‘lse’ feature.

-march=name Specify the name of the target architecture and, optionally, one or more feature modifiers. This option has the form‘ -march=arch{+[no]feature}*’. The permissible values for arch are ‘armv8-a’, ‘armv8.1-a’, ‘armv8.2-a’, ‘armv8.3-a’, ‘armv8.4-a’, ‘armv8.5-a’ or native. The value ‘armv8.5-a’ implies ‘armv8.4-a’ and enables compiler support for the ARMv8.5-A architecture extensions. The value ‘armv8.4-a’ implies ‘armv8.3-a’ and enables compiler support for the ARMv8.4-A architecture extensions. The value ‘armv8.3-a’ implies ‘armv8.2-a’ and enables compiler support for the ARMv8.3-A architecture extensions. The value ‘armv8.2-a’ implies ‘armv8.1-a’ and enables compiler support for the ARMv8.2-A architecture extensions. The value ‘armv8.1-a’ implies ‘armv8-a’ and enables compiler support for the ARMv8.1-A architecture extension. In particular, it enables the ‘+crc’, ‘+lse’, and ‘+rdma’ features. The value ‘native’ is available on native AArch64 GNU/Linux and causes the compiler to pick the architecture of the host system. This option has no effect if the compiler is unable to recognize the architecture of the host system, The permissible values for feature are listed in the sub-section on [‘-march’ and ‘-mcpu’ Feature Modifi ers], page 253 . Where conflicting feature modifiers are specified, the right-most feature is used. GCC uses name to determine what kind of instructions it can emit when generating assembly code. If ‘-march’ is specifi ed without either of‘ -mtune’ or ‘-mcpu’ also being specifi ed, the code is tuned to perform well across a range of target processors implementing the target architecture.

-mtune=name Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: ‘generic’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’, ‘cortex-a76’, ‘cortex-a76ae’, ‘cortex-a77’, ‘cortex-a65’, ‘cortex-a65ae’, ‘cortex-a34’, ‘ares’, ‘exynos-m1’, ‘emag’, ‘falkor’, ‘neoverse-e1’,‘neoverse-n1’,‘qdf24xx’, ‘saphira’, ‘phecda’, ‘xgene1’, ‘vulcan’, ‘octeontx’, ‘octeontx81’, ‘octeontx83’, ‘thunderx’, ‘thunderxt88’, ‘thunderxt88p1’, ‘thunderxt81’, ‘tsv110’, ‘thunderxt83’, ‘thunderx2t99’, ‘cortex-a57.cortex-a53’, ‘cortex-a72.cortex-a53’, ‘cortex-a73.cortex-a35’, ‘cortex-a73.cortex-a53’, ‘cortex-a75.cortex-a55’ ‘cortex-a76.cortex-a55’ ‘native’. The values ‘cortex-a57.cortex-a53’, ‘cortex-a72.cortex-a53’, ‘cortex-a73.cortex-a35’, ‘cortex-a73.cortex-a53’, ‘cortex-a75.cortex-a55’ ‘cortex-a76.cortex-a55’ specify that GCC should tune for a big.LITTLE system. Additionally on native AArch64 GNU/Linux systems the value ‘native’ tunes performance to the host system. This option has no eff ect if the compiler is unable to recognize the processor of the host system. Where none of ‘-mtune=’, ‘-mcpu=’ or ‘-march=’ are specifi ed, the code is tuned to perform well across a range of target processors. This option cannot be suffixed by feature modifiers.

-mcpu=name Specify the name of the target processor, optionally suffi xed by one or more feature modifi ers. This option has the form‘ -mcpu=cpu{+[no]feature}*’, where the permissible values for cpu are the same as those available for ‘-mtune’. The permissible values for feature are documented in the sub-section on [‘-march’ and ‘-mcpu’ Feature Modifi ers]. Where conflicting feature modifiers are specified, the right-most feature is used. GCC uses name to determine what kind of instructions it can emit when generating assembly code (as if by ‘-march’) and to determine the target processor for which to tune for performance (as if by ‘-mtune’). Where this option is used in conjunction with ‘-march’ or ‘-mtune’, those options take precedence over the appropriate part of this option. -moverride=string Override tuning decisions made by the back-end in response to a ‘-mtune=’ switch. The syntax, semantics, and accepted values for string in this option are not guaranteed to be consistent across releases. This option is only intended to be useful when developing GCC. -mverbose-cost-dump Enable verbose cost model dumping in the debug dump fi les. This option is provided for use in debugging the compiler.Chapter 3: GCC Command Options 253 -mpc-relative-literal-loads -mno-pc-relative-literal-loads Enable or disable PC-relative literal loads. With this option literal pools are accessed using a single instruction and emitted after each function. This lim- its the maximum size of functions to 1MB. This is enabled by default for ‘-mcmodel=tiny’. -msign-return-address=scope Select the function scope on which return address signing will be applied. Permissible values are ‘none’, which disables return address signing, ‘non-leaf’, which enables pointer signing for functions which are not leaf functions, and ‘all’, which enables pointer signing for all functions. The default value is ‘none’. This option has been deprecated by -mbranch-protection. -mbranch-protection=none|standard|pac-ret[+leaf+b-key]|bti Select the branch protection features to use. ‘none’ is the default and turns off all types of branch protection.‘ standard’ turns on all types of branch protection features. If a feature has additional tuning options, then ‘standard’ sets it to its standard level. ‘pac-ret[+leaf]’ turns on return address signing to its standard level: signing functions that save the return address to memory (non-leaf functions will practically always do this) using the a-key. The optional argument ‘leaf’ can be used to extend the signing to include leaf functions. The optional argument ‘b-key’ can be used to sign the functions with the B-key instead of the A-key. ‘bti’ turns on branch target identification mechanism. -msve-vector-bits=bits Specify the number of bits in an SVE vector register. This option only has an eff ect when SVE is enabled. GCC supports two forms of SVE code generation: “vector-length agnostic” output that works with any size of vector register and “vector-length specific” output that allows GCC to make assumptions about the vector length when it is useful for optimization reasons. The possible values of ‘bits’ are: ‘scalable’, ‘128’, ‘256’, ‘512’, ‘1024’ and ‘2048’. Specifying ‘scalable’ selects vector- length agnostic output. At present ‘-msve-vector-bits=128’ also generates vector-length agnostic output. All other values generate vector-length specific code. The behavior of these values may change in future releases and no value except ‘scalable’ should be relied on for producing code that is portable across diff erent hardware SVE vector lengths. The default is ‘-msve-vector-bits=scalable’, which produces vector-length agnostic code.

niklasf commented 4 years ago

Here's a minimal patch to maybe make some progress on this: #2760

abdulbadii commented 4 years ago

Thanks, must God bless you.. ameen

vondele commented 4 years ago

I'll merge the pull request to support armv8 and close the issue with it.