weidai11 / cryptopp

free C++ class library of cryptographic schemes
https://cryptopp.com
Other
4.84k stars 1.5k forks source link

cryptest: armv7l: failed to generate expected vector load/store instructions #481

Closed anonimal closed 7 years ago

anonimal commented 7 years ago

cryptest-result_armv7l-rev4.txt

processor       : 0
model name      : ARMv7 Processor rev 4 (v7l)
BogoMIPS        : 50.52
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 4
Testing: ARM NEON code generation
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
ERROR: failed to generate expected vector load instructions
ERROR: failed to generate expected vector store instructions

Referencing #480. Possibly related to https://github.com/monero-project/kovri/issues/699?

noloader commented 7 years ago

I believe this failure is common. When I see this one I usually ignore it. To say more, I would need to see the disassembly.

You can produce the disassembly by hand:

g++ -DNDEBUG -g2 -O2 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
objdump --disassemble aria-simd.o

The test is performed in cryptest.sh : 1440. What we are looking for is, in ARIA_UncheckedSetKey_Schedule_NEON, look at ARIA_GSRK_NEON. In the past ARIA_GSRK_NEON used to perform a vector load and save. There used to be over 50 of them.

Now, we perform the vector load ARIA_UncheckedSetKey_Schedule_NEON, so only 4 loads occur. ARIA_GSRK_NEON still performs the saves, so there's about 20 of them. So cryptest.sh is out-of-sync:

# ARIA::UncheckedKeySet: 8 vld1q.32
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vld')
if [[ ("$COUNT" -lt "8") ]]; then
    FAILED=1
    echo "ERROR: failed to generate expected vector load instructions" ...

Its is also possible vld1q.32 is using a different mnemonic than we are searching for. I usually see that from Clang.

With that said, this looks wrong:

g++ -DNDEBUG -g2 -O2 ...

We should be at -O3. That's when GCC applies vectorization (or more heavily applies it).

anonimal commented 7 years ago

To say more, I would need to see the disassembly.

aria-simd.objdump.txt

We should be at -O3. That's when GCC applies vectorization (or more heavily applies it).

1aecb3d confirmed. Logs attached in https://github.com/weidai11/cryptopp/issues/480#issuecomment-326116216.

noloader commented 7 years ago

aria-simd.objdump.txt

Here another example that causes the failures. Its called Multiple Register Transfer. You have one at offset 0x20, 0x3e, 0x4e, 0x5e, ... A single load instruction is actually executing multiple loads and incrementing pointers behind the scenes.

   0:   b5f0        push    {r4, r5, r6, r7, lr}
   2:   f101 0420   add.w   r4, r1, #32
   6:   ed2d 8b10   vpush   {d8-d15}
   a:   f101 0330   add.w   r3, r1, #48 ; 0x30
   e:   f101 0740   add.w   r7, r1, #64 ; 0x40
  12:   f100 0610   add.w   r6, r0, #16
  16:   f100 0520   add.w   r5, r0, #32
  1a:   f100 0e60   add.w   lr, r0, #96 ; 0x60
  1e:   2a10        cmp r2, #16
  20:   f964 4a8f   vld1.32 {d20-d21}, [r4]
  ...
  3e:   f961 0a8f   vld1.32 {d16-d17}, [r1]
  ...
  4e:   f963 2a8f   vld1.32 {d18-d19}, [r3]
  ...

For whatever reason the compiler determined it was more profitable to perform the double-D loads rather than a 1-Q load. Or maybe NEON cannot load a Q register due to lack of instruction support. Or maybe they are the same loads, but the mnemonics are different depending on the version of Binutils.

As I understand things, 2D = 1Q (and also 16B = 8S = 4W = 2D =1Q). For the load at 0x20, D20-21 is just Q10. They are just different views of the same register bank. Also see [1.3.2. NEON registers]() in the ARM ARM.

I think we can safely ignore this one. I opened a question on Stack Overflow at Difference between vld1.32 {d20-d21} and vld1q q10?.

noloader commented 7 years ago

I think Commit de8478af2ab5 cleared the issue. It did two things. First, it added vld1d.32 {d1,d2} awareness. Second, it reduced the minimum count required for success.

anonimal commented 7 years ago

I think Commit de8478af2ab5 cleared the issue.

cryptest-result_armv7l-rev4.txt

Is it still safe to ignore?

noloader commented 7 years ago

That's weird. Did you pull the latest?

According to the objdump, the gadget is producing 6 vld1.32 {d1,d2} insns. We added this test:

# ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vld1.32[[:space:]]*{')
if [[ ("$COUNT" -lt "6") ]]; then
    FAILED=1
    echo "ERROR: failed to generate expected vector load instructions"
fi

The objdump also indicates 19 vst1.32 {d1,d2} insns. And we added this test:

# ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vst1.32[[:space:]]*{')
if [[ ("$COUNT" -lt "16") ]]; then
    FAILED=1
    echo "ERROR: failed to generate expected NEON store instructions"
fi

Both should succeed.

anonimal commented 7 years ago

Did you pull the latest?

Sorry, my mistake: I did pull latest but that was on a separate machine (which this machine then needed to then pull from).

Both should succeed.

But they do, don't they? This patch:

diff --git a/cryptest.sh b/cryptest.sh
index 71bfaa5..4c1acbd 100755
--- a/cryptest.sh
+++ b/cryptest.sh
@@ -1308,6 +1308,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                if [[ ("$HAVE_ARMV8A") ]]; then
                        # ARIA::UncheckedKeySet: 8 vld1q.32
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vld')
+                        echo "ARIA::UncheckedKeySet: 8 vld1q.32 - our count: " $COUNT
                        if [[ ("$COUNT" -lt "8") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON load instructions" | tee -a "$TEST_RESULTS"
@@ -1315,6 +1316,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                else  # ARMv7
                        # ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2}
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vld1.32[[:space:]]*{')
+                        echo "ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2} - our count: " $COUNT
                        if [[ ("$COUNT" -lt "6") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON load instructions" | tee -a "$TEST_RESULTS"
@@ -1324,6 +1326,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                if [[ ("$HAVE_ARMV8A") ]]; then
                        # ARIA::UncheckedKeySet: 20 vstr1q.32
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vst')
+                        echo "ARIA::UncheckedKeySet: 20 vstr1q.32 - our count: " $COUNT
                        if [[ ("$COUNT" -lt "20") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON store instructions" | tee -a "$TEST_RESULTS"
@@ -1331,6 +1334,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                else
                        # ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2}
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vst1.32[[:space:]]*{')
+                        echo "ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2} - our count: " $COUNT
                        if [[ ("$COUNT" -lt "16") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON store instructions" | tee -a "$TEST_RESULTS"

with: Git branch: master (commit 5cd854b2d3cff4f8)

produces:

************************************
Testing: ARM NEON code generation

g++ -DNDEBUG -g2 -O3 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
ARIA::UncheckedKeySet: 8 vld1q.32 - our count:  7
ERROR: failed to generate NEON load instructions
ARIA::UncheckedKeySet: 20 vstr1q.32 - our count:  19
ERROR: failed to generate NEON store instructions

************************************
noloader commented 7 years ago

@anonimal,

Both should succeed.

But they do, don't they?

Does that mean if [[ ("$COUNT" -lt "6") ]] and if [[ ("$COUNT" -lt "16") ]] are broke? In the universe I live in, 7 is greater than 6; and 19 is greater than 16. We should never enter the block and echo "ERROR: failed to generate NEON....

anonimal commented 7 years ago

Does that mean if [[ ("$COUNT" -lt "6") ]] and if [[ ("$COUNT" -lt "16") ]] are broke?

No, not that I can see.

In the universe I live in, 7 is greater than 6; and 19 is greater than 16. We should never enter the block

And we don't. The if [[ ("$HAVE_ARMV8A") ]]; then blocks are always executed, not the else blocks which contain if [[ ("$COUNT" -lt "6") ]] and if [[ ("$COUNT" -lt "16") ]].

noloader commented 7 years ago

The if [[ ("$HAVE_ARMV8A") ]]; then blocks are always executed, not the else blocks

Ack. Try Commit 5b12be29e673.

anonimal commented 7 years ago

Commit 5b12be29e673.

************************************
Testing: ARM NEON code generation

g++ -DNDEBUG -g2 -O3 -fPIC -pthread -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
Verified NEON load, store, shfit left, shift right, xor machine instructions

************************************

👍 thanks @noloader, should we close as resolved?

noloader commented 7 years ago

thanks @noloader, should we close as resolved?

Yeah, I think we are OK.

I tested locally on my BananaPi (ARMv7-a) and CubieTruck (ARMv7-a), and my HiKey (ARMv8-a).

Previously I was just testing under my BananaPi. I don't know why my BananaPi was returning success when you witnessed failures.

anonimal commented 7 years ago

Thank you for testing 😄

BananaPi

I look forward to getting my hands on one.