Closed anyuta1166 closed 7 months ago
Hi @anyuta1166,
did you see and try argument --keep-mno-flags
?:
# resolve-march-native --help | grep -A1 mno | tail -n2
--keep-mno-flags keep -mno-* parameters (default: (superfluous ones)
stripped away)
This prints out a lot of unneeded flags, which are implied by -march=sandybridge
# resolve-march-native --keep-mno-flags
-march=sandybridge -mno-3dnow -mno-abm -mno-adx -mno-aes -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-avx -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bmi -mno-bmi2 -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-enqcmd -mno-f16c -mno-fma -mno-fma4 -mno-fsgsbase -mno-gfni -mno-hle -mno-hreset -mno-kl -mno-lwp -mno-lzcnt -mno-movbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-sse4a -mno-tbm -mno-tsxldtrk -mno-uintr -mno-vaes -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-xop -mno-xsavec -mno-xsaves --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
I've seen #44 and #48 and I thought that flags implied by march should by removed, while other flags should be kept.
But in my case -mno-avx
is erroneously removed.
@anyuta1166 could you elaborate why erroneously? I understand that your hardware does not have AVX but is there evidence that not passing -mno-avx
with -march=sandybridge
and compiling on hardware with AVX available produces binaries leveraging AVX so that you cannot run them on your Celeron hardware?
@anyuta1166 I happen to have Sandybridge with AVX at my fingertips, this is interesting:
# gcc -Q -march=sandybridge --help=target | grep avx | head -n1
-mavx [enabled]
That supports your point.
I have no evidence, but this could happen because -march=sandybridge
implies -mavx
.
AVX is disabled when I use march=native
.
# gcc -Q -march=sandybridge --help=target | grep avx | head -n1
-mavx [enabled]
# gcc -Q -march=native --help=target | grep avx | head -n1
-mavx [disabled]
# gcc -Q -march=native --help=target | grep march | head -n1
-march= sandybridge
I have no evidence, but this could happen because
-march=sandybridge
implies-mavx
. AVX is disabled when I usemarch=native
.# gcc -Q -march=sandybridge --help=target | grep avx | head -n1 -mavx [enabled] # gcc -Q -march=native --help=target | grep avx | head -n1 -mavx [disabled] # gcc -Q -march=native --help=target | grep march | head -n1 -march= sandybridge
@anyuta1166 I see, that doesn't look good for our case here.
@anyuta1166 I have pushed a new branch fix-handling-of-disabled-lines with an idea that may work: lines with [disabled]
were previously ignored and now they produce inverted arguments. It's a single commit over master
. If possible, I would ask to (first have look at the commit so you know what you're running is safe) and give it a try, e.g. like this:
cd "$(mktemp -d)"
git clone --branch fix-handling-of-disabled-lines https://github.com/hartwork/resolve-march-native
cd resolve-march-native/
python3 -m venv venv # or: virtualenv venv
source venv/bin/activate
pip3 install -e .
hash resolve-march-native
resolve-march-native --debug
What do you think?
@hartwork I've tried and it doesn't seem to work. The final output is the same.
(venv) /tmp/tmp.dxwGlKGwPD/resolve-march-native # resolve-march-native --debug
# gcc -S -fverbose-asm -o /tmp/tmpiv30onw4/march_native.s /tmp/tmpiv30onw4/empty.c -march=native
Flags extracted: --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048 -march=sandybridge -mcx16 -mfxsr -mmmx -mno-3dnow -mno-abm -mno-adx -mno-aes -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-avx -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bmi -mno-bmi2 -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-enqcmd -mno-f16c -mno-fma -mno-fma4 -mno-fsgsbase -mno-gfni -mno-hle -mno-hreset -mno-kl -mno-lwp -mno-lzcnt -mno-movbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-sse4a -mno-tbm -mno-tsxldtrk -mno-uintr -mno-vaes -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-xop -mno-xsavec -mno-xsaves -mpclmul -mpopcnt -msahf -msse -msse2 -msse3 -msse4.1 -msse4.2 -mssse3 -mtune=sandybridge -mxsave -mxsaveopt
Flags extracted: --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048 -ffp-contract=fast -m128bit-long-double -m64 -m80387 -mabi=sysv -maddress-mode=long -malign-data=compat -malign-functions=0 -malign-jumps=0 -malign-loops=0 -malign-stringops -march=sandybridge -masm=att -masm=intel -mavx -mavx256-split-unaligned-load -mavx256-split-unaligned-store -mbranch-cost=3 -mcpu= -mcrc32 -mcx16 -mdefault -mfancy-math-387 -mfentry-name= -mfentry-section= -mfp-ret-in-387 -mfpmath=sse -mfunction-return=keep -mfxsr -mglibc -mhard-float -mharden-sls=none -mieee-fp -mincoming-stack-boundary=0 -mindirect-branch=keep -minstrument-return=none -mlarge-data-threshold=65536 -mlong-double-80 -mmemcpy-strategy= -mmemset-strategy= -mmmx -mmwait -mno-16 -mno-32 -mno-3dnow -mno-3dnowa -mno-8bit-idiv -mno-96bit-long-double -mno-abm -mno-accumulate-outgoing-args -mno-adx -mno-aes -mno-align-double -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-android -mno-avx -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bionic -mno-bmi -mno-bmi2 -mno-call-ms2sysv-xlogues -mno-cet-switch -mno-cld -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-dispatch-scheduler -mno-dump-tune-features -mno-enqcmd -mno-f16c -mno-fentry -mno-fma -mno-fma4 -mno-force-drap -mno-force-indirect-call -mno-fsgsbase -mno-general-regs-only -mno-gfni -mno-hle -mno-hreset -mno-iamcu -mno-indirect-branch-cs-prefix -mno-indirect-branch-register -mno-inline-all-stringops -mno-inline-stringops-dynamically -mno-kl -mno-long-double-128 -mno-long-double-64 -mno-lwp -mno-lzcnt -mno-manual-endbr -mno-mitigate-rop -mno-movbe -mno-movdir64b -mno-movdiri -mno-mpx -mno-ms-bitfields -mno-musl -mno-mwaitx -mno-needed -mno-nop-mcount -mno-omit-leaf-frame-pointer -mno-pc32 -mno-pc64 -mno-pc80 -mno-pcommit -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-recip -mno-record-mcount -mno-record-return -mno-rtd -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-skip-rax-setup -mno-soft-float -mno-sse2avx -mno-sse4a -mno-sseregparm -mno-stack-arg-probe -mno-stackrealign -mno-tbm -mno-tsxldtrk -mno-uclibc -mno-uintr -mno-vaes -mno-vect8-ret-in-mem -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-x32 -mno-xop -mno-xsavec -mno-xsaves -mpclmul -mpopcnt -mprefer-vector-width=128 -mprefer-vector-width=none -mpreferred-stack-boundary=0 -mpush-args -mrecip= -mred-zone -mregparm=6 -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mstack-protector-guard-offset= -mstack-protector-guard-reg= -mstack-protector-guard-symbol= -mstack-protector-guard=tls -mstv -mtls-dialect=gnu -mtls-direct-seg-refs -mtune-ctrl= -mtune=sandybridge -mvzeroupper -mxsave -mxsaveopt
# gcc -S -fverbose-asm -o /tmp/tmpjo5nbxps/march_native.s /tmp/tmpjo5nbxps/empty.c -march=sandybridge
Flags extracted: -march=sandybridge
Flags extracted: -ffp-contract=fast -m128bit-long-double -m64 -m80387 -mabi=sysv -maddress-mode=long -malign-data=compat -malign-functions=0 -malign-jumps=0 -malign-loops=0 -malign-stringops -march=sandybridge -masm=att -masm=intel -mavx -mavx256-split-unaligned-load -mavx256-split-unaligned-store -mbranch-cost=3 -mcpu= -mcrc32 -mcx16 -mdefault -mfancy-math-387 -mfentry-name= -mfentry-section= -mfp-ret-in-387 -mfpmath=sse -mfunction-return=keep -mfxsr -mglibc -mhard-float -mharden-sls=none -mieee-fp -mincoming-stack-boundary=0 -mindirect-branch=keep -minstrument-return=none -mlarge-data-threshold=65536 -mlong-double-80 -mmemcpy-strategy= -mmemset-strategy= -mmmx -mmwait -mno-16 -mno-32 -mno-3dnow -mno-3dnowa -mno-8bit-idiv -mno-96bit-long-double -mno-abm -mno-accumulate-outgoing-args -mno-adx -mno-aes -mno-align-double -mno-amx-bf16 -mno-amx-int8 -mno-amx-tile -mno-android -mno-avx2 -mno-avx5124fmaps -mno-avx5124vnniw -mno-avx512bf16 -mno-avx512bitalg -mno-avx512bw -mno-avx512cd -mno-avx512dq -mno-avx512er -mno-avx512f -mno-avx512ifma -mno-avx512pf -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vl -mno-avx512vnni -mno-avx512vp2intersect -mno-avx512vpopcntdq -mno-avxvnni -mno-bionic -mno-bmi -mno-bmi2 -mno-call-ms2sysv-xlogues -mno-cet-switch -mno-cld -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mno-dispatch-scheduler -mno-dump-tune-features -mno-enqcmd -mno-f16c -mno-fentry -mno-fma -mno-fma4 -mno-force-drap -mno-force-indirect-call -mno-fsgsbase -mno-general-regs-only -mno-gfni -mno-hle -mno-hreset -mno-iamcu -mno-indirect-branch-cs-prefix -mno-indirect-branch-register -mno-inline-all-stringops -mno-inline-stringops-dynamically -mno-kl -mno-long-double-128 -mno-long-double-64 -mno-lwp -mno-lzcnt -mno-manual-endbr -mno-mitigate-rop -mno-movbe -mno-movdir64b -mno-movdiri -mno-mpx -mno-ms-bitfields -mno-musl -mno-mwaitx -mno-needed -mno-nop-mcount -mno-omit-leaf-frame-pointer -mno-pc32 -mno-pc64 -mno-pc80 -mno-pcommit -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mno-rdrnd -mno-rdseed -mno-recip -mno-record-mcount -mno-record-return -mno-rtd -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-skip-rax-setup -mno-soft-float -mno-sse2avx -mno-sse4a -mno-sseregparm -mno-stack-arg-probe -mno-stackrealign -mno-tbm -mno-tsxldtrk -mno-uclibc -mno-uintr -mno-vaes -mno-vect8-ret-in-mem -mno-vpclmulqdq -mno-waitpkg -mno-wbnoinvd -mno-widekl -mno-x32 -mno-xop -mno-xsavec -mno-xsaves -mpclmul -mpopcnt -mprefer-vector-width=128 -mprefer-vector-width=none -mpreferred-stack-boundary=0 -mpush-args -mrecip= -mred-zone -mregparm=6 -msahf -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mstack-protector-guard-offset= -mstack-protector-guard-reg= -mstack-protector-guard-symbol= -mstack-protector-guard=tls -mstv -mtls-dialect=gnu -mtls-direct-seg-refs -mtune-ctrl= -mtune=sandybridge -mvzeroupper -mxsave -mxsaveopt
-march=sandybridge --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
I have double checked that I'm running the correct version:
(venv) /tmp/tmp.dxwGlKGwPD/resolve-march-native # which resolve-march-native
/tmp/tmp.dxwGlKGwPD/resolve-march-native/venv/bin/resolve-march-native
(venv) /tmp/tmp.dxwGlKGwPD/resolve-march-native # /tmp/tmp.dxwGlKGwPD/resolve-march-native/venv/bin/resolve-march-native
-march=sandybridge --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
I've noticed that the second "Flags extracted" line in the debug output contains both -mavx
and -mno-avx
at the same time.
@anyuta1166 thanks for testing and reporting back! It needs more thought then, I'm optimistic that it can be fixed. Are there any time constraints regarding this on your side that would be good to know about?
@anyuta1166 PS: could you run these four GCC commands on the Celeron box and attach the raw .txt
files they produced? That would allow me to better predict and test the exact results that you'd be getting, the full picture.
gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=native > assembly-native.txt
gcc -S -fverbose-asm -o /dev/stdout "$(mktemp --suffix=.c)" -march=sandybridge > assembly-sandybridge.txt
gcc -Q --help=target -march=native > native-target-help.txt
gcc -Q --help=target -march=sandybridge > sandybridge-target-help.txt
This is similar to what resolve-march-native would run (with #112 merged):
$ resolve-march-native --debug |& grep '^#'
# gcc -S -fverbose-asm -o /tmp/tmp_2glk89e/march_native.s /tmp/tmp_2glk89e/empty.c -march=native
# gcc -Q --help=target -march=native
# gcc -S -fverbose-asm -o /tmp/tmpl2rqyosu/march_native.s /tmp/tmpl2rqyosu/empty.c -march=sandybridge
# gcc -Q --help=target -march=sandybridge
Thanks in advance!
@anyuta1166 thank you! :+1:
@anyuta1166 btw I'm just learning that your gcc -Q --help=target -march=sandybridge
and mine differ, most interesting maybe this hard difference:
- -mvzeroupper [enabled]
+ -mvzeroupper [disabled]
The rest is omissions. Mine is with GCC 12.3.1_p20230825, which version did you use?
I guess the the implication would be that resolved flags are not guaranteed to give the same results with different versions of GCC.
@anyuta1166 update: found "Gentoo 11.3.0 p7" in some of the files you shared now, nevermind.
Hi @anyuta1166,
I found one more issue that broke things for you last time, and I think I have it working now. To summarize, the two issues were:
-mavx [disabled]
was ignored, it should have produced flag -mo-avx
-msse5 -mavx
produced flag -mavx
but should have been ignored.As a result, you should now get this output for your case…
# resolve-march-native --vertical
-march=sandybridge
-mno-avx
--param=l1-cache-line-size=64
--param=l1-cache-size=32
--param=l2-cache-size=2048
…and the code has a regression test for that very case to keep it working.
There are new tests, your test data, and the two fixes in a new pull request #115 on new branch fix-target-help-parser
now. I made you the author of the commit adding the test data for credit — thanks again! — please check, if the current way of representation is okay.
It would be great if you could test the new code to confirm that the test suite and your reality agree. This is the known recipe except with a different branch name:
cd "$(mktemp -d)"
git clone --branch fix-target-help-parser https://github.com/hartwork/resolve-march-native
cd resolve-march-native/
python3 -m venv venv # or: virtualenv venv
source venv/bin/activate
pip3 install -e .
hash resolve-march-native
resolve-march-native --debug
Would be great to hear back from you on this, thanks! If all goes well, I'll release 4.0.0 with this merged, shortly.
I confirm that it works fine now.
# resolve-march-native
-march=sandybridge -mno-avx --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
I also confirm that it works fine now for Intel(R) Celeron(R) 3755U (Broadwell)
distccflags script:
# ./distccflags -march=native
CFLAGS="-march=broadwell -mabm -mno-adx -mno-avx2 -mno-avx -mno-bmi2 -mno-bmi -mno-f16c -mno-fma"
resolve-march-native (release version, -mno-avx
is missing)
# resolve-march-native
-march=broadwell -mabm -mno-adx -mno-avx2 -mno-bmi -mno-bmi2 -mno-f16c -mno-fma --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
resolve-march-native (fixed version, everything is fine)
# resolve-march-native
-march=broadwell -mabm -mno-adx -mno-avx -mno-avx2 -mno-bmi -mno-bmi2 -mno-f16c -mno-fma --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048
@anyuta1166 thanks for testing and reporting back! I have now released 4.0.0 with a fix to:
Enjoy :smiley:
Processor: Intel(R) Celeron(R) CPU 847 @ 1.10GHz (Sandy Bridge without AVX). OS: Gentoo Linux GCC version: 11.3.0
-mno-avx
flag should be added, but it is missing in output.distccflags
script correctly shows this flag.