Closed anonimal closed 7 years ago
I believe this failure is common. When I see this one I usually ignore it. To say more, I would need to see the disassembly.
You can produce the disassembly by hand:
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
objdump --disassemble aria-simd.o
The test is performed in cryptest.sh
: 1440. What we are looking for is, in ARIA_UncheckedSetKey_Schedule_NEON
, look at ARIA_GSRK_NEON
. In the past ARIA_GSRK_NEON
used to perform a vector load and save. There used to be over 50 of them.
Now, we perform the vector load ARIA_UncheckedSetKey_Schedule_NEON
, so only 4 loads occur. ARIA_GSRK_NEON
still performs the saves, so there's about 20 of them. So cryptest.sh
is out-of-sync:
# ARIA::UncheckedKeySet: 8 vld1q.32
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vld')
if [[ ("$COUNT" -lt "8") ]]; then
FAILED=1
echo "ERROR: failed to generate expected vector load instructions" ...
Its is also possible vld1q.32
is using a different mnemonic than we are searching for. I usually see that from Clang.
With that said, this looks wrong:
g++ -DNDEBUG -g2 -O2 ...
We should be at -O3
. That's when GCC applies vectorization (or more heavily applies it).
To say more, I would need to see the disassembly.
We should be at -O3. That's when GCC applies vectorization (or more heavily applies it).
1aecb3d confirmed. Logs attached in https://github.com/weidai11/cryptopp/issues/480#issuecomment-326116216.
aria-simd.objdump.txt
Here another example that causes the failures. Its called Multiple Register Transfer. You have one at offset 0x20, 0x3e, 0x4e, 0x5e, ... A single load instruction is actually executing multiple loads and incrementing pointers behind the scenes.
0: b5f0 push {r4, r5, r6, r7, lr}
2: f101 0420 add.w r4, r1, #32
6: ed2d 8b10 vpush {d8-d15}
a: f101 0330 add.w r3, r1, #48 ; 0x30
e: f101 0740 add.w r7, r1, #64 ; 0x40
12: f100 0610 add.w r6, r0, #16
16: f100 0520 add.w r5, r0, #32
1a: f100 0e60 add.w lr, r0, #96 ; 0x60
1e: 2a10 cmp r2, #16
20: f964 4a8f vld1.32 {d20-d21}, [r4]
...
3e: f961 0a8f vld1.32 {d16-d17}, [r1]
...
4e: f963 2a8f vld1.32 {d18-d19}, [r3]
...
For whatever reason the compiler determined it was more profitable to perform the double-D loads rather than a 1-Q load. Or maybe NEON cannot load a Q register due to lack of instruction support. Or maybe they are the same loads, but the mnemonics are different depending on the version of Binutils.
As I understand things, 2D = 1Q (and also 16B = 8S = 4W = 2D =1Q). For the load at 0x20, D20-21 is just Q10. They are just different views of the same register bank. Also see [1.3.2. NEON registers]() in the ARM ARM.
I think we can safely ignore this one. I opened a question on Stack Overflow at Difference between vld1.32 {d20-d21} and vld1q q10?.
I think Commit de8478af2ab5 cleared the issue. It did two things. First, it added vld1d.32 {d1,d2}
awareness. Second, it reduced the minimum count required for success.
I think Commit de8478af2ab5 cleared the issue.
cryptest-result_armv7l-rev4.txt
Is it still safe to ignore?
That's weird. Did you pull the latest?
According to the objdump, the gadget is producing 6 vld1.32 {d1,d2}
insns. We added this test:
# ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vld1.32[[:space:]]*{')
if [[ ("$COUNT" -lt "6") ]]; then
FAILED=1
echo "ERROR: failed to generate expected vector load instructions"
fi
The objdump also indicates 19 vst1.32 {d1,d2}
insns. And we added this test:
# ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vst1.32[[:space:]]*{')
if [[ ("$COUNT" -lt "16") ]]; then
FAILED=1
echo "ERROR: failed to generate expected NEON store instructions"
fi
Both should succeed.
Did you pull the latest?
Sorry, my mistake: I did pull latest but that was on a separate machine (which this machine then needed to then pull from).
Both should succeed.
But they do, don't they? This patch:
diff --git a/cryptest.sh b/cryptest.sh
index 71bfaa5..4c1acbd 100755
--- a/cryptest.sh
+++ b/cryptest.sh
@@ -1308,6 +1308,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
if [[ ("$HAVE_ARMV8A") ]]; then
# ARIA::UncheckedKeySet: 8 vld1q.32
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vld')
+ echo "ARIA::UncheckedKeySet: 8 vld1q.32 - our count: " $COUNT
if [[ ("$COUNT" -lt "8") ]]; then
FAILED=1
echo "ERROR: failed to generate NEON load instructions" | tee -a "$TEST_RESULTS"
@@ -1315,6 +1316,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
else # ARMv7
# ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vld1.32[[:space:]]*{')
+ echo "ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2} - our count: " $COUNT
if [[ ("$COUNT" -lt "6") ]]; then
FAILED=1
echo "ERROR: failed to generate NEON load instructions" | tee -a "$TEST_RESULTS"
@@ -1324,6 +1326,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
if [[ ("$HAVE_ARMV8A") ]]; then
# ARIA::UncheckedKeySet: 20 vstr1q.32
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vst')
+ echo "ARIA::UncheckedKeySet: 20 vstr1q.32 - our count: " $COUNT
if [[ ("$COUNT" -lt "20") ]]; then
FAILED=1
echo "ERROR: failed to generate NEON store instructions" | tee -a "$TEST_RESULTS"
@@ -1331,6 +1334,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
else
# ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vst1.32[[:space:]]*{')
+ echo "ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2} - our count: " $COUNT
if [[ ("$COUNT" -lt "16") ]]; then
FAILED=1
echo "ERROR: failed to generate NEON store instructions" | tee -a "$TEST_RESULTS"
with: Git branch: master (commit 5cd854b2d3cff4f8)
produces:
************************************
Testing: ARM NEON code generation
g++ -DNDEBUG -g2 -O3 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
ARIA::UncheckedKeySet: 8 vld1q.32 - our count: 7
ERROR: failed to generate NEON load instructions
ARIA::UncheckedKeySet: 20 vstr1q.32 - our count: 19
ERROR: failed to generate NEON store instructions
************************************
@anonimal,
Both should succeed.
But they do, don't they?
Does that mean if [[ ("$COUNT" -lt "6") ]]
and if [[ ("$COUNT" -lt "16") ]]
are broke? In the universe I live in, 7 is greater than 6; and 19 is greater than 16. We should never enter the block and echo "ERROR: failed to generate NEON...
.
Does that mean if [[ ("$COUNT" -lt "6") ]] and if [[ ("$COUNT" -lt "16") ]] are broke?
No, not that I can see.
In the universe I live in, 7 is greater than 6; and 19 is greater than 16. We should never enter the block
And we don't. The if [[ ("$HAVE_ARMV8A") ]]; then
blocks are always executed, not the else blocks which contain if [[ ("$COUNT" -lt "6") ]]
and if [[ ("$COUNT" -lt "16") ]]
.
The if [[ ("$HAVE_ARMV8A") ]]; then blocks are always executed, not the else blocks
Ack. Try Commit 5b12be29e673.
Commit 5b12be29e673.
************************************
Testing: ARM NEON code generation
g++ -DNDEBUG -g2 -O3 -fPIC -pthread -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
Verified NEON load, store, shfit left, shift right, xor machine instructions
************************************
👍 thanks @noloader, should we close as resolved?
thanks @noloader, should we close as resolved?
Yeah, I think we are OK.
I tested locally on my BananaPi (ARMv7-a) and CubieTruck (ARMv7-a), and my HiKey (ARMv8-a).
Previously I was just testing under my BananaPi. I don't know why my BananaPi was returning success when you witnessed failures.
Thank you for testing 😄
BananaPi
I look forward to getting my hands on one.
cryptest-result_armv7l-rev4.txt
Referencing #480. Possibly related to https://github.com/monero-project/kovri/issues/699?