hartwork / resolve-march-native

:snail: Tool to determine what GCC flags -march=native would resolve into
https://pypi.python.org/pypi/resolve-march-native
41 stars 7 forks source link

Missing `-mno-avx` flag in output for Sandy Bridge Celeron without AVX #110

Closed anyuta1166 closed 7 months ago

anyuta1166 commented 7 months ago

Processor: Intel(R) Celeron(R) CPU 847 @ 1.10GHz (Sandy Bridge without AVX). OS: Gentoo Linux GCC version: 11.3.0

-mno-avx flag should be added, but it is missing in output. distccflags script correctly shows this flag.

# resolve-march-native
-march=sandybridge --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
# ./distccflags -march=native
CFLAGS="-march=sandybridge -mno-avx"
# gcc -Q -march=sandybridge --help=target
The following options are target specific:
  -m128bit-long-double                  [enabled]
  -m16                                  [disabled]
  -m32                                  [disabled]
  -m3dnow                               [disabled]
  -m3dnowa                              [disabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -m8bit-idiv                           [disabled]
  -m96bit-long-double                   [disabled]
  -mabi=                                sysv
  -mabm                                 [disabled]
  -maccumulate-outgoing-args            [disabled]
  -maddress-mode=                       long
  -madx                                 [disabled]
  -maes                                 [disabled]
  -malign-data=                         compat
  -malign-double                        [disabled]
  -malign-functions=                    0
  -malign-jumps=                        0
  -malign-loops=                        0
  -malign-stringops                     [enabled]
  -mamx-bf16                            [disabled]
  -mamx-int8                            [disabled]
  -mamx-tile                            [disabled]
  -mandroid                             [disabled]
  -march=                               sandybridge
  -masm=                                att
  -mavx                                 [enabled]
  -mavx2                                [disabled]
  -mavx256-split-unaligned-load         [enabled]
  -mavx256-split-unaligned-store        [enabled]
  -mavx5124fmaps                        [disabled]
  -mavx5124vnniw                        [disabled]
  -mavx512bf16                          [disabled]
  -mavx512bitalg                        [disabled]
  -mavx512bw                            [disabled]
  -mavx512cd                            [disabled]
  -mavx512dq                            [disabled]
  -mavx512er                            [disabled]
  -mavx512f                             [disabled]
  -mavx512ifma                          [disabled]
  -mavx512pf                            [disabled]
  -mavx512vbmi                          [disabled]
  -mavx512vbmi2                         [disabled]
  -mavx512vl                            [disabled]
  -mavx512vnni                          [disabled]
  -mavx512vp2intersect                  [disabled]
  -mavx512vpopcntdq                     [disabled]
  -mavxvnni                             [disabled]
  -mbionic                              [disabled]
  -mbmi                                 [disabled]
  -mbmi2                                [disabled]
  -mbranch-cost=<0,5>                   3
  -mcall-ms2sysv-xlogues                [disabled]
  -mcet-switch                          [disabled]
  -mcld                                 [disabled]
  -mcldemote                            [disabled]
  -mclflushopt                          [disabled]
  -mclwb                                [disabled]
  -mclzero                              [disabled]
  -mcmodel=                             [default]
  -mcpu=
  -mcrc32                               [enabled]
  -mcx16                                [enabled]
  -mdispatch-scheduler                  [disabled]
  -mdump-tune-features                  [disabled]
  -menqcmd                              [disabled]
  -mf16c                                [disabled]
  -mfancy-math-387                      [enabled]
  -mfentry                              [disabled]
  -mfentry-name=
  -mfentry-section=
  -mfma                                 [disabled]
  -mfma4                                [disabled]
  -mforce-drap                          [disabled]
  -mforce-indirect-call                 [disabled]
  -mfp-ret-in-387                       [enabled]
  -mfpmath=                             sse
  -mfsgsbase                            [disabled]
  -mfunction-return=                    keep
  -mfused-madd                          -ffp-contract=fast
  -mfxsr                                [enabled]
  -mgeneral-regs-only                   [disabled]
  -mgfni                                [disabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mharden-sls=                         none
  -mhle                                 [disabled]
  -mhreset                              [disabled]
  -miamcu                               [disabled]
  -mieee-fp                             [enabled]
  -mincoming-stack-boundary=            0
  -mindirect-branch-cs-prefix           [disabled]
  -mindirect-branch-register            [disabled]
  -mindirect-branch=                    keep
  -minline-all-stringops                [disabled]
  -minline-stringops-dynamically        [disabled]
  -minstrument-return=                  none
  -mintel-syntax                        -masm=intel
  -mkl                                  [disabled]
  -mlarge-data-threshold=<number>       65536
  -mlong-double-128                     [disabled]
  -mlong-double-64                      [disabled]
  -mlong-double-80                      [enabled]
  -mlwp                                 [disabled]
  -mlzcnt                               [disabled]
  -mmanual-endbr                        [disabled]
  -mmemcpy-strategy=
  -mmemset-strategy=
  -mmitigate-rop                        [disabled]
  -mmmx                                 [enabled]
  -mmovbe                               [disabled]
  -mmovdir64b                           [disabled]
  -mmovdiri                             [disabled]
  -mmpx                                 [disabled]
  -mms-bitfields                        [disabled]
  -mmusl                                [disabled]
  -mmwait                               [enabled]
  -mmwaitx                              [disabled]
  -mneeded                              [disabled]
  -mno-align-stringops                  [disabled]
  -mno-default                          [disabled]
  -mno-fancy-math-387                   [disabled]
  -mno-push-args                        [disabled]
  -mno-red-zone                         [disabled]
  -mno-sse4                             [disabled]
  -mnop-mcount                          [disabled]
  -momit-leaf-frame-pointer             [disabled]
  -mpc32                                [disabled]
  -mpc64                                [disabled]
  -mpc80                                [disabled]
  -mpclmul                              [enabled]
  -mpcommit                             [disabled]
  -mpconfig                             [disabled]
  -mpku                                 [disabled]
  -mpopcnt                              [enabled]
  -mprefer-avx128                       -mprefer-vector-width=128
  -mprefer-vector-width=                none
  -mpreferred-stack-boundary=           0
  -mprefetchwt1                         [disabled]
  -mprfchw                              [disabled]
  -mptwrite                             [disabled]
  -mpush-args                           [enabled]
  -mrdpid                               [disabled]
  -mrdrnd                               [disabled]
  -mrdseed                              [disabled]
  -mrecip                               [disabled]
  -mrecip=
  -mrecord-mcount                       [disabled]
  -mrecord-return                       [disabled]
  -mred-zone                            [enabled]
  -mregparm=                            6
  -mrtd                                 [disabled]
  -mrtm                                 [disabled]
  -msahf                                [enabled]
  -mserialize                           [disabled]
  -msgx                                 [disabled]
  -msha                                 [disabled]
  -mshstk                               [disabled]
  -mskip-rax-setup                      [disabled]
  -msoft-float                          [disabled]
  -msse                                 [enabled]
  -msse2                                [enabled]
  -msse2avx                             [disabled]
  -msse3                                [enabled]
  -msse4                                [enabled]
  -msse4.1                              [enabled]
  -msse4.2                              [enabled]
  -msse4a                               [disabled]
  -msse5                                -mavx
  -msseregparm                          [disabled]
  -mssse3                               [enabled]
  -mstack-arg-probe                     [disabled]
  -mstack-protector-guard-offset=
  -mstack-protector-guard-reg=
  -mstack-protector-guard-symbol=
  -mstack-protector-guard=              tls
  -mstackrealign                        [disabled]
  -mstringop-strategy=                  [default]
  -mstv                                 [enabled]
  -mtbm                                 [disabled]
  -mtls-dialect=                        gnu
  -mtls-direct-seg-refs                 [enabled]
  -mtsxldtrk                            [disabled]
  -mtune-ctrl=
  -mtune=                               sandybridge
  -muclibc                              [disabled]
  -muintr                               [disabled]
  -mvaes                                [disabled]
  -mveclibabi=                          [default]
  -mvect8-ret-in-mem                    [disabled]
  -mvpclmulqdq                          [disabled]
  -mvzeroupper                          [enabled]
  -mwaitpkg                             [disabled]
  -mwbnoinvd                            [disabled]
  -mwidekl                              [disabled]
  -mx32                                 [disabled]
  -mxop                                 [disabled]
  -mxsave                               [enabled]
  -mxsavec                              [disabled]
  -mxsaveopt                            [enabled]
  -mxsaves                              [disabled]

  Known assembler dialects (for use with the -masm= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known choices for mitigation against straight line speculation with -mharden-sls=:
    all indirect-jmp none return

  Known indirect branch choices (for use with the -mindirect-branch=/-mfunction-return= options):
    keep thunk thunk-extern thunk-inline

  Known choices for return instrumentation with -minstrument-return=:
    call none nop5

  Known data alignment choices (for use with the -malign-data= option):
    abi cacheline compat

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Known preferred register vector length (to use with the -mprefer-vector-width= option):
    128 256 512 none

  Known stack protector guard (for use with the -mstack-protector-guard= option):
    global tls

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop vector_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2

  Known valid arguments for -march= option:
    i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 btver1 btver2 generic native

  Known valid arguments for -mtune= option:
    generic i386 i486 pentium lakemont pentiumpro pentium4 nocona core2 nehalem sandybridge haswell bonnell silvermont goldmont goldmont-plus tremont knl knm skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake rocketlake intel geode k6 athlon k8 amdfam10 bdver1 bdver2 bdver3 bdver4 btver1 btver2 znver1 znver2 znver3
# gcc -Q -march=native --help=target
The following options are target specific:
  -m128bit-long-double                  [enabled]
  -m16                                  [disabled]
  -m32                                  [disabled]
  -m3dnow                               [disabled]
  -m3dnowa                              [disabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -m8bit-idiv                           [disabled]
  -m96bit-long-double                   [disabled]
  -mabi=                                sysv
  -mabm                                 [disabled]
  -maccumulate-outgoing-args            [disabled]
  -maddress-mode=                       long
  -madx                                 [disabled]
  -maes                                 [disabled]
  -malign-data=                         compat
  -malign-double                        [disabled]
  -malign-functions=                    0
  -malign-jumps=                        0
  -malign-loops=                        0
  -malign-stringops                     [enabled]
  -mamx-bf16                            [disabled]
  -mamx-int8                            [disabled]
  -mamx-tile                            [disabled]
  -mandroid                             [disabled]
  -march=                               sandybridge
  -masm=                                att
  -mavx                                 [disabled]
  -mavx2                                [disabled]
  -mavx256-split-unaligned-load         [enabled]
  -mavx256-split-unaligned-store        [enabled]
  -mavx5124fmaps                        [disabled]
  -mavx5124vnniw                        [disabled]
  -mavx512bf16                          [disabled]
  -mavx512bitalg                        [disabled]
  -mavx512bw                            [disabled]
  -mavx512cd                            [disabled]
  -mavx512dq                            [disabled]
  -mavx512er                            [disabled]
  -mavx512f                             [disabled]
  -mavx512ifma                          [disabled]
  -mavx512pf                            [disabled]
  -mavx512vbmi                          [disabled]
  -mavx512vbmi2                         [disabled]
  -mavx512vl                            [disabled]
  -mavx512vnni                          [disabled]
  -mavx512vp2intersect                  [disabled]
  -mavx512vpopcntdq                     [disabled]
  -mavxvnni                             [disabled]
  -mbionic                              [disabled]
  -mbmi                                 [disabled]
  -mbmi2                                [disabled]
  -mbranch-cost=<0,5>                   3
  -mcall-ms2sysv-xlogues                [disabled]
  -mcet-switch                          [disabled]
  -mcld                                 [disabled]
  -mcldemote                            [disabled]
  -mclflushopt                          [disabled]
  -mclwb                                [disabled]
  -mclzero                              [disabled]
  -mcmodel=                             [default]
  -mcpu=
  -mcrc32                               [enabled]
  -mcx16                                [enabled]
  -mdispatch-scheduler                  [disabled]
  -mdump-tune-features                  [disabled]
  -menqcmd                              [disabled]
  -mf16c                                [disabled]
  -mfancy-math-387                      [enabled]
  -mfentry                              [disabled]
  -mfentry-name=
  -mfentry-section=
  -mfma                                 [disabled]
  -mfma4                                [disabled]
  -mforce-drap                          [disabled]
  -mforce-indirect-call                 [disabled]
  -mfp-ret-in-387                       [enabled]
  -mfpmath=                             sse
  -mfsgsbase                            [disabled]
  -mfunction-return=                    keep
  -mfused-madd                          -ffp-contract=fast
  -mfxsr                                [enabled]
  -mgeneral-regs-only                   [disabled]
  -mgfni                                [disabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mharden-sls=                         none
  -mhle                                 [disabled]
  -mhreset                              [disabled]
  -miamcu                               [disabled]
  -mieee-fp                             [enabled]
  -mincoming-stack-boundary=            0
  -mindirect-branch-cs-prefix           [disabled]
  -mindirect-branch-register            [disabled]
  -mindirect-branch=                    keep
  -minline-all-stringops                [disabled]
  -minline-stringops-dynamically        [disabled]
  -minstrument-return=                  none
  -mintel-syntax                        -masm=intel
  -mkl                                  [disabled]
  -mlarge-data-threshold=<number>       65536
  -mlong-double-128                     [disabled]
  -mlong-double-64                      [disabled]
  -mlong-double-80                      [enabled]
  -mlwp                                 [disabled]
  -mlzcnt                               [disabled]
  -mmanual-endbr                        [disabled]
  -mmemcpy-strategy=
  -mmemset-strategy=
  -mmitigate-rop                        [disabled]
  -mmmx                                 [enabled]
  -mmovbe                               [disabled]
  -mmovdir64b                           [disabled]
  -mmovdiri                             [disabled]
  -mmpx                                 [disabled]
  -mms-bitfields                        [disabled]
  -mmusl                                [disabled]
  -mmwait                               [enabled]
  -mmwaitx                              [disabled]
  -mneeded                              [disabled]
  -mno-align-stringops                  [disabled]
  -mno-default                          [disabled]
  -mno-fancy-math-387                   [disabled]
  -mno-push-args                        [disabled]
  -mno-red-zone                         [disabled]
  -mno-sse4                             [disabled]
  -mnop-mcount                          [disabled]
  -momit-leaf-frame-pointer             [disabled]
  -mpc32                                [disabled]
  -mpc64                                [disabled]
  -mpc80                                [disabled]
  -mpclmul                              [enabled]
  -mpcommit                             [disabled]
  -mpconfig                             [disabled]
  -mpku                                 [disabled]
  -mpopcnt                              [enabled]
  -mprefer-avx128                       -mprefer-vector-width=128
  -mprefer-vector-width=                none
  -mpreferred-stack-boundary=           0
  -mprefetchwt1                         [disabled]
  -mprfchw                              [disabled]
  -mptwrite                             [disabled]
  -mpush-args                           [enabled]
  -mrdpid                               [disabled]
  -mrdrnd                               [disabled]
  -mrdseed                              [disabled]
  -mrecip                               [disabled]
  -mrecip=
  -mrecord-mcount                       [disabled]
  -mrecord-return                       [disabled]
  -mred-zone                            [enabled]
  -mregparm=                            6
  -mrtd                                 [disabled]
  -mrtm                                 [disabled]
  -msahf                                [enabled]
  -mserialize                           [disabled]
  -msgx                                 [disabled]
  -msha                                 [disabled]
  -mshstk                               [disabled]
  -mskip-rax-setup                      [disabled]
  -msoft-float                          [disabled]
  -msse                                 [enabled]
  -msse2                                [enabled]
  -msse2avx                             [disabled]
  -msse3                                [enabled]
  -msse4                                [enabled]
  -msse4.1                              [enabled]
  -msse4.2                              [enabled]
  -msse4a                               [disabled]
  -msse5                                -mavx
  -msseregparm                          [disabled]
  -mssse3                               [enabled]
  -mstack-arg-probe                     [disabled]
  -mstack-protector-guard-offset=
  -mstack-protector-guard-reg=
  -mstack-protector-guard-symbol=
  -mstack-protector-guard=              tls
  -mstackrealign                        [disabled]
  -mstringop-strategy=                  [default]
  -mstv                                 [enabled]
  -mtbm                                 [disabled]
  -mtls-dialect=                        gnu
  -mtls-direct-seg-refs                 [enabled]
  -mtsxldtrk                            [disabled]
  -mtune-ctrl=
  -mtune=                               sandybridge
  -muclibc                              [disabled]
  -muintr                               [disabled]
  -mvaes                                [disabled]
  -mveclibabi=                          [default]
  -mvect8-ret-in-mem                    [disabled]
  -mvpclmulqdq                          [disabled]
  -mvzeroupper                          [enabled]
  -mwaitpkg                             [disabled]
  -mwbnoinvd                            [disabled]
  -mwidekl                              [disabled]
  -mx32                                 [disabled]
  -mxop                                 [disabled]
  -mxsave                               [enabled]
  -mxsavec                              [disabled]
  -mxsaveopt                            [enabled]
  -mxsaves                              [disabled]

  Known assembler dialects (for use with the -masm= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known choices for mitigation against straight line speculation with -mharden-sls=:
    all indirect-jmp none return

  Known indirect branch choices (for use with the -mindirect-branch=/-mfunction-return= options):
    keep thunk thunk-extern thunk-inline

  Known choices for return instrumentation with -minstrument-return=:
    call none nop5

  Known data alignment choices (for use with the -malign-data= option):
    abi cacheline compat

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Known preferred register vector length (to use with the -mprefer-vector-width= option):
    128 256 512 none

  Known stack protector guard (for use with the -mstack-protector-guard= option):
    global tls

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop vector_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2

  Known valid arguments for -march= option:
    i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 btver1 btver2 generic native

  Known valid arguments for -mtune= option:
    generic i386 i486 pentium lakemont pentiumpro pentium4 nocona core2 nehalem sandybridge haswell bonnell silvermont goldmont goldmont-plus tremont knl knm skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake rocketlake intel geode k6 athlon k8 amdfam10 bdver1 bdver2 bdver3 bdver4 btver1 btver2 znver1 znver2 znver3
hartwork commented 7 months ago

Hi @anyuta1166,

did you see and try argument --keep-mno-flags?:

# resolve-march-native --help | grep -A1 mno | tail -n2
  --keep-mno-flags      keep -mno-* parameters (default: (superfluous ones)
                        stripped away)
anyuta1166 commented 7 months ago

This prints out a lot of unneeded flags, which are implied by -march=sandybridge

# resolve-march-native --keep-mno-flags
-march=sandybridge -mno-3dnow -mno-abm -mno-adx -mno-aes -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-avx -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bmi -mno-bmi2 -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-enqcmd -mno-f16c -mno-fma -mno-fma4 -mno-fsgsbase -mno-gfni -mno-hle -mno-hreset -mno-kl -mno-lwp -mno-lzcnt -mno-movbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-sse4a -mno-tbm -mno-tsxldtrk -mno-uintr -mno-vaes -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-xop -mno-xsavec -mno-xsaves --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
anyuta1166 commented 7 months ago

I've seen #44 and #48 and I thought that flags implied by march should by removed, while other flags should be kept.

But in my case -mno-avx is erroneously removed.

hartwork commented 7 months ago

@anyuta1166 could you elaborate why erroneously? I understand that your hardware does not have AVX but is there evidence that not passing -mno-avx with -march=sandybridge and compiling on hardware with AVX available produces binaries leveraging AVX so that you cannot run them on your Celeron hardware?

hartwork commented 7 months ago

@anyuta1166 I happen to have Sandybridge with AVX at my fingertips, this is interesting:

# gcc -Q -march=sandybridge --help=target | grep avx | head -n1
  -mavx                                 [enabled]

That supports your point.

anyuta1166 commented 7 months ago

I have no evidence, but this could happen because -march=sandybridge implies -mavx. AVX is disabled when I use march=native.


# gcc -Q -march=sandybridge --help=target | grep avx | head -n1
  -mavx                                 [enabled]
# gcc -Q -march=native --help=target | grep avx | head -n1
  -mavx                                 [disabled]
# gcc -Q -march=native --help=target | grep march | head -n1
  -march=                               sandybridge
hartwork commented 7 months ago

I have no evidence, but this could happen because -march=sandybridge implies -mavx. AVX is disabled when I use march=native.


# gcc -Q -march=sandybridge --help=target | grep avx | head -n1
  -mavx                                 [enabled]
# gcc -Q -march=native --help=target | grep avx | head -n1
  -mavx                                 [disabled]
# gcc -Q -march=native --help=target | grep march | head -n1
  -march=                               sandybridge

@anyuta1166 I see, that doesn't look good for our case here.

hartwork commented 7 months ago

@anyuta1166 I have pushed a new branch fix-handling-of-disabled-lines with an idea that may work: lines with [disabled] were previously ignored and now they produce inverted arguments. It's a single commit over master. If possible, I would ask to (first have look at the commit so you know what you're running is safe) and give it a try, e.g. like this:

cd "$(mktemp -d)"
git clone --branch fix-handling-of-disabled-lines https://github.com/hartwork/resolve-march-native
cd resolve-march-native/
python3 -m venv venv  # or: virtualenv venv
source venv/bin/activate
pip3 install -e .
hash resolve-march-native
resolve-march-native --debug

What do you think?

anyuta1166 commented 7 months ago

@hartwork I've tried and it doesn't seem to work. The final output is the same.

(venv) /tmp/tmp.dxwGlKGwPD/resolve-march-native # resolve-march-native --debug
# gcc -S -fverbose-asm -o /tmp/tmpiv30onw4/march_native.s /tmp/tmpiv30onw4/empty.c -march=native
Flags extracted: --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048 -march=sandybridge -mcx16 -mfxsr -mmmx -mno-3dnow -mno-abm -mno-adx -mno-aes -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-avx -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bmi -mno-bmi2 -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-enqcmd -mno-f16c -mno-fma -mno-fma4 -mno-fsgsbase -mno-gfni -mno-hle -mno-hreset -mno-kl -mno-lwp -mno-lzcnt -mno-movbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-sse4a -mno-tbm -mno-tsxldtrk -mno-uintr -mno-vaes -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-xop -mno-xsavec -mno-xsaves -mpclmul -mpopcnt -msahf -msse -msse2 -msse3 -msse4.1 -msse4.2 -mssse3 -mtune=sandybridge -mxsave -mxsaveopt
Flags extracted: --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048 -ffp-contract=fast -m128bit-long-double -m64 -m80387 -mabi=sysv -maddress-mode=long -malign-data=compat -malign-functions=0 -malign-jumps=0 -malign-loops=0 -malign-stringops -march=sandybridge -masm=att -masm=intel -mavx -mavx256-split-unaligned-load -mavx256-split-unaligned-store -mbranch-cost=3 -mcpu= -mcrc32 -mcx16 -mdefault -mfancy-math-387 -mfentry-name= -mfentry-section= -mfp-ret-in-387 -mfpmath=sse -mfunction-return=keep -mfxsr -mglibc -mhard-float -mharden-sls=none -mieee-fp -mincoming-stack-boundary=0 -mindirect-branch=keep -minstrument-return=none -mlarge-data-threshold=65536 -mlong-double-80 -mmemcpy-strategy= -mmemset-strategy= -mmmx -mmwait -mno-16 -mno-32 -mno-3dnow -mno-3dnowa -mno-8bit-idiv -mno-96bit-long-double -mno-abm -mno-accumulate-outgoing-args -mno-adx -mno-aes -mno-align-double -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-android -mno-avx -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bionic -mno-bmi -mno-bmi2 -mno-call-ms2sysv-xlogues -mno-cet-switch -mno-cld -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-dispatch-scheduler -mno-dump-tune-features -mno-enqcmd -mno-f16c -mno-fentry -mno-fma -mno-fma4 -mno-force-drap -mno-force-indirect-call -mno-fsgsbase -mno-general-regs-only -mno-gfni -mno-hle -mno-hreset -mno-iamcu -mno-indirect-branch-cs-prefix -mno-indirect-branch-register -mno-inline-all-stringops -mno-inline-stringops-dynamically -mno-kl -mno-long-double-128 -mno-long-double-64 -mno-lwp -mno-lzcnt -mno-manual-endbr -mno-mitigate-rop -mno-movbe -mno-movdir64b -mno-movdiri -mno-mpx -mno-ms-bitfields -mno-musl -mno-mwaitx -mno-needed -mno-nop-mcount -mno-omit-leaf-frame-pointer -mno-pc32 -mno-pc64 -mno-pc80 -mno-pcommit -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-recip -mno-record-mcount -mno-record-return -mno-rtd -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-skip-rax-setup -mno-soft-float -mno-sse2avx -mno-sse4a -mno-sseregparm -mno-stack-arg-probe -mno-stackrealign -mno-tbm -mno-tsxldtrk -mno-uclibc -mno-uintr -mno-vaes -mno-vect8-ret-in-mem -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-x32 -mno-xop -mno-xsavec -mno-xsaves -mpclmul -mpopcnt -mprefer-vector-width=128 -mprefer-vector-width=none -mpreferred-stack-boundary=0 -mpush-args -mrecip= -mred-zone -mregparm=6 -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mstack-protector-guard-offset= -mstack-protector-guard-reg= -mstack-protector-guard-symbol= -mstack-protector-guard=tls -mstv -mtls-dialect=gnu -mtls-direct-seg-refs -mtune-ctrl= -mtune=sandybridge -mvzeroupper -mxsave -mxsaveopt
# gcc -S -fverbose-asm -o /tmp/tmpjo5nbxps/march_native.s /tmp/tmpjo5nbxps/empty.c -march=sandybridge
Flags extracted: -march=sandybridge
Flags extracted: -ffp-contract=fast -m128bit-long-double -m64 -m80387 -mabi=sysv -maddress-mode=long -malign-data=compat -malign-functions=0 -malign-jumps=0 -malign-loops=0 -malign-stringops -march=sandybridge -masm=att -masm=intel -mavx -mavx256-split-unaligned-load -mavx256-split-unaligned-store -mbranch-cost=3 -mcpu= -mcrc32 -mcx16 -mdefault -mfancy-math-387 -mfentry-name= -mfentry-section= -mfp-ret-in-387 -mfpmath=sse -mfunction-return=keep -mfxsr -mglibc -mhard-float -mharden-sls=none -mieee-fp -mincoming-stack-boundary=0 -mindirect-branch=keep -minstrument-return=none -mlarge-data-threshold=65536 -mlong-double-80 -mmemcpy-strategy= -mmemset-strategy= -mmmx -mmwait -mno-16 -mno-32 -mno-3dnow -mno-3dnowa -mno-8bit-idiv -mno-96bit-long-double -mno-abm -mno-accumulate-outgoing-args -mno-adx -mno-aes -mno-align-double -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-android -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bionic -mno-bmi -mno-bmi2 -mno-call-ms2sysv-xlogues -mno-cet-switch -mno-cld -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-dispatch-scheduler -mno-dump-tune-features -mno-enqcmd -mno-f16c -mno-fentry -mno-fma -mno-fma4 -mno-force-drap -mno-force-indirect-call -mno-fsgsbase -mno-general-regs-only -mno-gfni -mno-hle -mno-hreset -mno-iamcu -mno-indirect-branch-cs-prefix -mno-indirect-branch-register -mno-inline-all-stringops -mno-inline-stringops-dynamically -mno-kl -mno-long-double-128 -mno-long-double-64 -mno-lwp -mno-lzcnt -mno-manual-endbr -mno-mitigate-rop -mno-movbe -mno-movdir64b -mno-movdiri -mno-mpx -mno-ms-bitfields -mno-musl -mno-mwaitx -mno-needed -mno-nop-mcount -mno-omit-leaf-frame-pointer -mno-pc32 -mno-pc64 -mno-pc80 -mno-pcommit -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-recip -mno-record-mcount -mno-record-return -mno-rtd -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-skip-rax-setup -mno-soft-float -mno-sse2avx -mno-sse4a -mno-sseregparm -mno-stack-arg-probe -mno-stackrealign -mno-tbm -mno-tsxldtrk -mno-uclibc -mno-uintr -mno-vaes -mno-vect8-ret-in-mem -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-x32 -mno-xop -mno-xsavec -mno-xsaves -mpclmul -mpopcnt -mprefer-vector-width=128 -mprefer-vector-width=none -mpreferred-stack-boundary=0 -mpush-args -mrecip= -mred-zone -mregparm=6 -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mstack-protector-guard-offset= -mstack-protector-guard-reg= -mstack-protector-guard-symbol= -mstack-protector-guard=tls -mstv -mtls-dialect=gnu -mtls-direct-seg-refs -mtune-ctrl= -mtune=sandybridge -mvzeroupper -mxsave -mxsaveopt
-march=sandybridge --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048

I have double checked that I'm running the correct version:

(venv) /tmp/tmp.dxwGlKGwPD/resolve-march-native # which resolve-march-native
/tmp/tmp.dxwGlKGwPD/resolve-march-native/venv/bin/resolve-march-native
(venv) /tmp/tmp.dxwGlKGwPD/resolve-march-native # /tmp/tmp.dxwGlKGwPD/resolve-march-native/venv/bin/resolve-march-native
-march=sandybridge --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
anyuta1166 commented 7 months ago

I've noticed that the second "Flags extracted" line in the debug output contains both -mavx and -mno-avx at the same time.

hartwork commented 7 months ago

@anyuta1166 thanks for testing and reporting back! It needs more thought then, I'm optimistic that it can be fixed. Are there any time constraints regarding this on your side that would be good to know about?

hartwork commented 7 months ago

@anyuta1166 PS: could you run these four GCC commands on the Celeron box and attach the raw .txt files they produced? That would allow me to better predict and test the exact results that you'd be getting, the full picture.

gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=native      > assembly-native.txt
gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=sandybridge > assembly-sandybridge.txt
gcc -Q --help=target -march=native                                             > native-target-help.txt
gcc -Q --help=target -march=sandybridge                                        > sandybridge-target-help.txt

This is similar to what resolve-march-native would run (with #112 merged):

$ resolve-march-native --debug |& grep '^#'
# gcc -S -fverbose-asm -o /tmp/tmp_2glk89e/march_native.s /tmp/tmp_2glk89e/empty.c -march=native
# gcc -Q --help=target -march=native
# gcc -S -fverbose-asm -o /tmp/tmpl2rqyosu/march_native.s /tmp/tmpl2rqyosu/empty.c -march=sandybridge
# gcc -Q --help=target -march=sandybridge

Thanks in advance!

anyuta1166 commented 7 months ago

assembly-native.txt assembly-sandybridge.txt native-target-help.txt sandybridge-target-help.txt

hartwork commented 7 months ago

@anyuta1166 thank you! :+1:

hartwork commented 7 months ago

@anyuta1166 btw I'm just learning that your gcc -Q --help=target -march=sandybridge and mine differ, most interesting maybe this hard difference:

-  -mvzeroupper                         [enabled]
+  -mvzeroupper                         [disabled]

The rest is omissions. Mine is with GCC 12.3.1_p20230825, which version did you use?

I guess the the implication would be that resolved flags are not guaranteed to give the same results with different versions of GCC.

hartwork commented 7 months ago

@anyuta1166 update: found "Gentoo 11.3.0 p7" in some of the files you shared now, nevermind.

hartwork commented 7 months ago

Hi @anyuta1166,

I found one more issue that broke things for you last time, and I think I have it working now. To summarize, the two issues were:

As a result, you should now get this output for your case…

# resolve-march-native --vertical
-march=sandybridge
-mno-avx
--param=l1-cache-line-size=64
--param=l1-cache-size=32
--param=l2-cache-size=2048

…and the code has a regression test for that very case to keep it working.

There are new tests, your test data, and the two fixes in a new pull request #115 on new branch fix-target-help-parser now. I made you the author of the commit adding the test data for credit — thanks again! — please check, if the current way of representation is okay.

It would be great if you could test the new code to confirm that the test suite and your reality agree. This is the known recipe except with a different branch name:

cd "$(mktemp -d)"
git clone --branch fix-target-help-parser https://github.com/hartwork/resolve-march-native
cd resolve-march-native/
python3 -m venv venv  # or: virtualenv venv
source venv/bin/activate
pip3 install -e .
hash resolve-march-native
resolve-march-native --debug

Would be great to hear back from you on this, thanks! If all goes well, I'll release 4.0.0 with this merged, shortly.

anyuta1166 commented 7 months ago

I confirm that it works fine now.

# resolve-march-native
-march=sandybridge -mno-avx --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048

debug.txt

anyuta1166 commented 7 months ago

I also confirm that it works fine now for Intel(R) Celeron(R) 3755U (Broadwell)

distccflags script:

 # ./distccflags -march=native
CFLAGS="-march=broadwell -mabm -mno-adx -mno-avx2 -mno-avx -mno-bmi2 -mno-bmi -mno-f16c -mno-fma"

resolve-march-native (release version, -mno-avx is missing)

# resolve-march-native
-march=broadwell -mabm -mno-adx -mno-avx2 -mno-bmi -mno-bmi2 -mno-f16c -mno-fma --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048

resolve-march-native (fixed version, everything is fine)

# resolve-march-native
-march=broadwell -mabm -mno-adx -mno-avx -mno-avx2 -mno-bmi -mno-bmi2 -mno-f16c -mno-fma --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
hartwork commented 7 months ago

@anyuta1166 thanks for testing and reporting back! I have now released 4.0.0 with a fix to:

Enjoy :smiley: