hartwork / resolve-march-native

:snail: Tool to determine what GCC flags -march=native would resolve into
https://pypi.python.org/pypi/resolve-march-native
41 stars 7 forks source link

-mcpu= -mfix-cortex-a53-835769 -mfix-cortex-a53-843419 on M2 #136

Open stintel opened 5 months ago

stintel commented 5 months ago

As requested downstream in https://bugs.gentoo.org/show_bug.cgi?id=924184:

➜ resolve-march-native
ERROR: No entry -m(arch|cpu)=.. found in: -mabi=lp64 -march= -mbranch-protection= -mcmodel=small -mcpu= -mfix-cortex-a53-835769 -mfix-cortex-a53-843419 -mglibc -mharden-sls= -mlittle-endian -mno-big-endian -mno-bionic -mno-general-regs-only -mno-low-precision-div -mno-low-precision-recip-sqrt -mno-low-precision-sqrt -mno-musl -mno-strict-align -mno-track-speculation -mno-uclibc -mno-verbose-cost-dump -momit-leaf-frame-pointer -moutline-atomics -moverride= -mpc-relative-literal-loads -msign-return-address=none -mstack-protector-guard-offset= -mstack-protector-guard-reg= -mstack-protector-guard=global -msve-vector-bits=scalable -mtls-dialect=desc -mtls-size=24 -mtune=
➜ gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=native
        .arch armv8-a
        .file   "tmp.X34nHLwR98.c"
// GNU C17 (Gentoo 13.2.1_p20240113-r1 p12) version 13.2.1 20240113 (aarch64-unknown-linux-gnu)
//      compiled by GNU C version 13.2.1 20240113, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version none
// GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
// options passed: -mlittle-endian -mabi=lp64
        .text
        .ident  "GCC: (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113"
        .section        .note.GNU-stack,"",@progbits
➜ gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=armv8-a
        .arch armv8-a
        .file   "tmp.bJNx0uorAh.c"
// GNU C17 (Gentoo 13.2.1_p20240113-r1 p12) version 13.2.1 20240113 (aarch64-unknown-linux-gnu)
//      compiled by GNU C version 13.2.1 20240113, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version none
// GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
// options passed: -march=armv8-a -mlittle-endian -mabi=lp64
        .text
        .ident  "GCC: (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113"
        .section        .note.GNU-stack,"",@progbits
➜ gcc -Q --help=target -march=native
The following options are target specific:
  -mabi=                                lp64
  -march=
  -mbig-endian                          [disabled]
  -mbionic                              [disabled]
  -mbranch-protection=
  -mcmodel=                             small
  -mcpu=
  -mfix-cortex-a53-835769               [enabled]
  -mfix-cortex-a53-843419               [enabled]
  -mgeneral-regs-only                   [disabled]
  -mglibc                               [enabled]
  -mharden-sls=
  -mlittle-endian                       [enabled]
  -mlow-precision-div                   [disabled]
  -mlow-precision-recip-sqrt            [disabled]
  -mlow-precision-sqrt                  [disabled]
  -mmusl                                [disabled]
  -momit-leaf-frame-pointer             [enabled]
  -moutline-atomics                     [enabled]
  -moverride=<string>
  -mpc-relative-literal-loads           [enabled]
  -msign-return-address=                none
  -mstack-protector-guard-offset=
  -mstack-protector-guard-reg=
  -mstack-protector-guard=              global
  -mstrict-align                        [disabled]
  -msve-vector-bits=<number>            scalable
  -mtls-dialect=                        desc
  -mtls-size=                           24
  -mtrack-speculation                   [disabled]
  -mtune=
  -muclibc                              [disabled]
  -mverbose-cost-dump                   [disabled]

  Known AArch64 ABIs (for use with the -mabi= option):
    ilp32 lp64

  Supported AArch64 return address signing scope (for use with -msign-return-address= option):
    all non-leaf none

  The code model option names for -mcmodel:
    large small tiny

  Valid arguments to -mstack-protector-guard=:
    global sysreg

  The possible SVE vector lengths:
    1024 128 2048 256 512 scalable

  The possible TLS dialects:
    desc trad
➜ gcc -Q --help=target -march=armv8-a
The following options are target specific:
  -mabi=                                lp64
  -march=                               armv8-a
  -mbig-endian                          [disabled]
  -mbionic                              [disabled]
  -mbranch-protection=
  -mcmodel=                             small
  -mcpu=
  -mfix-cortex-a53-835769               [enabled]
  -mfix-cortex-a53-843419               [enabled]
  -mgeneral-regs-only                   [disabled]
  -mglibc                               [enabled]
  -mharden-sls=
  -mlittle-endian                       [enabled]
  -mlow-precision-div                   [disabled]
  -mlow-precision-recip-sqrt            [disabled]
  -mlow-precision-sqrt                  [disabled]
  -mmusl                                [disabled]
  -momit-leaf-frame-pointer             [enabled]
  -moutline-atomics                     [enabled]
  -moverride=<string>
  -mpc-relative-literal-loads           [enabled]
  -msign-return-address=                none
  -mstack-protector-guard-offset=
  -mstack-protector-guard-reg=
  -mstack-protector-guard=              global
  -mstrict-align                        [disabled]
  -msve-vector-bits=<number>            scalable
  -mtls-dialect=                        desc
  -mtls-size=                           24
  -mtrack-speculation                   [disabled]
  -mtune=
  -muclibc                              [disabled]
  -mverbose-cost-dump                   [disabled]

  Known AArch64 ABIs (for use with the -mabi= option):
    ilp32 lp64

  Supported AArch64 return address signing scope (for use with -msign-return-address= option):
    all non-leaf none

  The code model option names for -mcmodel:
    large small tiny

  Valid arguments to -mstack-protector-guard=:
    global sysreg

  The possible SVE vector lengths:
    1024 128 2048 256 512 scalable

  The possible TLS dialects:
    desc trad

I decided "-march=armv8-a" based on the output of the first gcc command, if that's not what you wanted, please let me know and I'll update the post.

hartwork commented 3 months ago

Hi @stintel,

I thought I had replied here long ago but it seems that either my comment got lost or I never pushed the Comment button :thinking:

What's interesting about your machine's case is that neither -march nor its sort-of alias -mcpu is set, so there is not terribly much that resolve-march-native could do, other than maybe just forwarding those two -mlittle-endian -mabi=lp64… but that output would in some sense be a lie. From a user's point of view, what would you wish for from resolve-march-native in such a case?

stintel commented 3 months ago

It appears that -march=native doesn't produce useful info, but using -march=native -mcpu=native does:

➜ gcc -fverbose-asm -march=native -mcpu=native -Q --help=target
The following options are target specific:
  -mabi=                                lp64
  -march=                               armv8-a+crc+lse+rcpc+rdma+dotprod+aes+sha3+fp16fml+sb+ssbs+i8mm+bf16+flagm+pauth
  -mbig-endian                          [disabled]
  -mbionic                              [disabled]
  -mbranch-protection=
  -mcmodel=                             small
  -mcpu=
  -mfix-cortex-a53-835769               [enabled]
  -mfix-cortex-a53-843419               [enabled]
  -mgeneral-regs-only                   [disabled]
  -mglibc                               [enabled]
  -mharden-sls=
  -mlittle-endian                       [enabled]
  -mlow-precision-div                   [disabled]
  -mlow-precision-recip-sqrt            [disabled]
  -mlow-precision-sqrt                  [disabled]
  -mmusl                                [disabled]
  -momit-leaf-frame-pointer             [enabled]
  -moutline-atomics                     [enabled]
  -moverride=<string>
  -mpc-relative-literal-loads           [enabled]
  -msign-return-address=                none
  -mstack-protector-guard-offset=
  -mstack-protector-guard-reg=
  -mstack-protector-guard=              global
  -mstrict-align                        [disabled]
  -msve-vector-bits=<number>            scalable
  -mtls-dialect=                        desc
  -mtls-size=                           24
  -mtrack-speculation                   [disabled]
  -mtune=
  -muclibc                              [disabled]
  -mverbose-cost-dump                   [disabled]

  Known AArch64 ABIs (for use with the -mabi= option):
    ilp32 lp64

  Supported AArch64 return address signing scope (for use with -msign-return-address= option):
    all non-leaf none

  The code model option names for -mcmodel:
    large small tiny

  Valid arguments to -mstack-protector-guard=:
    global sysreg

  The possible SVE vector lengths:
    1024 128 2048 256 512 scalable

  The possible TLS dialects:
    desc trad

So maybe as a fallback option, when -march=native does not produce anything useful, retry with -mcpu=native added?

hartwork commented 3 months ago

@stintel interesting! Did you try -mcpu=nativein isolation also, i.e. gcc -fverbose-asm -mcpu=native -Q --help=target?

stintel commented 3 months ago

I did. Sorry, should have added that in my previous comment:

➜ gcc -fverbose-asm -mcpu=native -Q --help=target
The following options are target specific:
  -mabi=                                lp64
  -march=                     
  -mbig-endian                          [disabled]
  -mbionic                              [disabled]
  -mbranch-protection=        
  -mcmodel=                             small
  -mcpu=                      
  -mfix-cortex-a53-835769               [enabled]
  -mfix-cortex-a53-843419               [enabled]
  -mgeneral-regs-only                   [disabled]
  -mglibc                               [enabled]
  -mharden-sls=               
  -mlittle-endian                       [enabled]
  -mlow-precision-div                   [disabled]
  -mlow-precision-recip-sqrt            [disabled]
  -mlow-precision-sqrt                  [disabled]
  -mmusl                                [disabled]
  -momit-leaf-frame-pointer             [enabled]
  -moutline-atomics                     [enabled]
  -moverride=<string>         
  -mpc-relative-literal-loads           [enabled]
  -msign-return-address=                none
  -mstack-protector-guard-offset= 
  -mstack-protector-guard-reg= 
  -mstack-protector-guard=              global
  -mstrict-align                        [disabled]
  -msve-vector-bits=<number>            scalable
  -mtls-dialect=                        desc
  -mtls-size=                           24
  -mtrack-speculation                   [disabled]
  -mtune=                     
  -muclibc                              [disabled]
  -mverbose-cost-dump                   [disabled]

  Known AArch64 ABIs (for use with the -mabi= option):
    ilp32 lp64

  Supported AArch64 return address signing scope (for use with -msign-return-address= option):
    all non-leaf none

  The code model option names for -mcmodel:
    large small tiny

  Valid arguments to -mstack-protector-guard=:
    global sysreg

  The possible SVE vector lengths:
    1024 128 2048 256 512 scalable

  The possible TLS dialects:
    desc trad
hartwork commented 3 months ago

@stintel so it does need both combined, fancy. Thanks! Let me think more about a fix, I'm tending to try both combined first rather than last at the moment…