Open erickq opened 2 years ago
cc @sdesmalen-arm @david-arm who may be able to help
Hi @erickq I tried compiling that function above using the latest armclang and the command "armclang -O3 -mcpu=neoverse-v1 -S /tmp/foo.cpp", but I didn't see any interleaving happening - I just saw a single st1w and whilelo in the loop. Can you confirm what command you used and which version of armclang?
thx @david-arm. armclang++ -O3 -march=armv8-a -msve-vector-bits=128 test.cpp
armclang vesion: Arm C/C++/Fortran Compiler version 22.0.1 (build number 1630) (based on LLVM 13.0.0)
Hi @erickq, so this is actually a NEON issue because you haven't specified the SVE feature in the march flag. In order to use SVE you have to build with the command 'clang++ -O3 -march=armv8-a+sve -msve-vector-bits=128 test.cpp`.
@david-arm Sorry, it's a neon issue. I made a mistake. Did you reproduce this problem in armclang?
Recently, I was working on software optimization. Recently, I found that armclang performed better than clang in the following program. After static comparison, I found that armclang interleave was set to 2. and clang will be set to 1.
Run the
-mllvm -small-loop-cost=26
options command to set the interleave count to 2. However, the default value of smallloopcost is 20.Similarly, when
SmallLoopCost
is set to 20, armclang does not set interleave count to 2.Note that the performance of this test case deteriorates when interleave conut is set to 1.
My question is, is SmallLoopCost too conservative? The default value can be 25 or 30.
Please help me.