weidai11 / cryptopp

free C++ class library of cryptographic schemes
https://cryptopp.com
Other
4.92k stars 1.51k forks source link

The ARM NEON Speck optimization is wrong #808

Closed asbai closed 5 years ago

asbai commented 5 years ago

Crypto++ 8.0 Issue Report

GCC 4.7 (gcc-linaro-arm-linux-gnueabihf-4.7) with ARMv8 and NEON enabled:

-march=armv7-a
-mfpu=neon-vfpv4

We encrypt the plaintext with Speck128-128 and dcrypt it immediately on a Freescale i.MX6 UltraLite processor, and the result is wrong.

If we disabled the NEON optimization by #undef CRYPTOPP_ARM_NEON_AVAILABLE at the begin of speck.cpp, then everything is ok.

noloader commented 5 years ago

What is the result of running ./cryptest.exe tv speck?

asbai commented 5 years ago

emmmm.... It is not convenient for us to run this test on the ARM box. But I think it's very clear that the NEON implementation of Speck128 has bug. Because both encryption and decryption are using the NEON one on a same processor, and the result after decryption is inconsistent with the plaintext before encryption.

Then consider that everything is fine after NEON has disabled, So this is obviously a bug in NEON implementation, right?

asbai commented 5 years ago

run ./cryptest.exe tv speck on a x86 machine does not make sense because the x86 ASM optimization just work fine.

asbai commented 5 years ago

Sorry, my typo, it's an ARMv7 processor, not v8.

In addition, I feel that the problem in NEON is in the decryption section. If I remember correctly, data encrypted with NEON can be correctly decrypted by x86 code.

noloader commented 5 years ago

Here's what I'm seeing on my Wandboard, which has a Freescale with dual cores. The test vectors are at speck.txt.

cryptopp$ ./cryptest.exe tv speck
Using seed: 1550185193

Testing SymmetricCipher algorithm SPECK-64/ECB.
................
Testing SymmetricCipher algorithm SPECK-64/CBC.
..............
Testing SymmetricCipher algorithm SPECK-64/CTR.
..............
Testing SymmetricCipher algorithm SPECK-128/ECB.
........................
Testing SymmetricCipher algorithm SPECK-128/CBC.
.....................
Testing SymmetricCipher algorithm SPECK-128/CTR.
.....................
Tests complete. Total tests = 110. Failed tests = 0.

However, I'm using GCC 4.9:

$ gcc --version
gcc (Debian 4.9.2-10+deb8u2) 4.9.2

And:

cryptopp$ cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 7.54
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

processor       : 1
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 7.54
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

Hardware        : Freescale i.MX6 Quad/DualLite (Device Tree)
Revision        : 0000
Serial          : 0000000000000000
asbai commented 5 years ago

OK, here is my simple test:

Taget Platform   : MSW/x86/VC2005
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 6E 6D 73 FB 82 CF BC 5D
Decryption Result: test123~
Taget Platform   : MSW/x86/ICL2011 (SSE)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 6E 6D 73 FB 82 CF BC 5D
Decryption Result: test123~
Taget Platform   : Linux/ARMv7a/GCC4.7 (CPP)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 6E 6D 73 FB 82 CF BC 5D
Decryption Result: test123~
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: ??@Z?Ar_

So I think it's clear the encryption part of the NEON implementation has some problem....

UPDATE: We are using CTR mode.

asbai commented 5 years ago

I'm using gcc 4.7 with the best optimization "-O3", @noloader which gcc version and flags are you using? may be it's a gcc bug? did you enabled the neon instructions?

asbai commented 5 years ago

And here is my cpuinfo:

root@iot:~# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 10.54
Features        : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

Hardware        : Freescale i.MX6 UltraLite (Device Tree)
Revision        : 0000
Serial          : 0000000000000000
asbai commented 5 years ago

@noloader can you do the same simple test on your arm board? You can download the toolchain gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415 at here: https://releases.linaro.org/archive/13.04/components/toolchain/binaries/

Thanks :-)

noloader commented 5 years ago

@noloader can you do the same simple test on your arm board?

Well, I'm tied up on another project at the moment.

Send me your authorized_keys file. It should include RSA and ECDSA keys. My email is noloader, gmail. I'll give you remote access to the board. You can install what you like as an admin.

I can also provide access to Solaris 11.3 x86 with Sun Studio 12.2-12.6, if you are interested in supporting the platform.

noloader commented 5 years ago

I'm using gcc 4.7 with the best optimization "-O3"

You might try backing off to -O2. Open the makefile and find the recipe for speck.o and speck_simd.o. Then add something like this:

# IBM XLC -O3 optimization bug
ifeq ($(XLC_COMPILER),1)
sm3.o : sm3.cpp
    $(CXX) $(strip $(subst -O3,-O2,$(CXXFLAGS)) -c) $<
donna_32.o : donna_32.cpp
    $(CXX) $(strip $(subst -O3,-O2,$(CXXFLAGS)) -c) $<
donna_64.o : donna_64.cpp
    $(CXX) $(strip $(subst -O3,-O2,$(CXXFLAGS)) -c) $<
endif

We have to do it in several places due to compiler bugs.

asbai commented 5 years ago

Send me your authorized_keys file.

OK, I've sent the rsa pub key to you email box.

You might try backing off to -O2

Sure, I'll give it a try, Thanks :-)

asbai commented 5 years ago

I'm trying to use -O2 and -O to compile the speck.cpp and speck128_simd.cpp, but nothing changed.

However, I also observed a strange thing: after NEON is enabled, the content of the decryption result seems to be random.

For example:

root@iot:~# ./test_speck_arm_neon
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: ▒mo▒͝▒▒
root@iot:~# ./test_speck_arm_neon
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: ▒▒$▒h[▒
root@iot:~# ./test_speck_arm_neon
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: ▒/▒5ZMH8
root@iot:~# ./test_speck_arm_neon
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: ▒▒▒▒1}▒

The WRONG cipher text is not changed, but the decryption result changes every time.

noloader commented 5 years ago

If we disabled the NEON optimization by #undef CRYPTOPP_ARM_NEON_AVAILABLE at the begin of speck.cpp, then everything is ok.

There's also SIMON and SPECK specific defines to disable extensions in simon.h and speck.h. They were added because of GCC and Clang bugs. We needed a way to disable extensions on some algorithms, like SIMON and SPECK, while leaving it in tact for others, like AES and SHA.

Here are a couple of samples:

I don't think the fixes were backported beyond about GCC 6.0. And Clang does not backport.

And don't get me started on Sun Studio. That's a collection of scripts that pretends to be a compiler and produces heaping loads of shit.

asbai commented 5 years ago

@noloader I've been tested it on your board, and the result is same:

$ ./test_speck_arm_neon
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: ▒▒.▒▒
$ ./test_speck_arm_neon
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: ▒s^;▒q
$ ./test_speck_arm_neon
Taget Platform   : Linux/ARMv7a/GCC4.7 (NEON)
Key              : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
IV               : FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
PlainText        : test123~
CipherText       : 1E 0A 72 33 00 1C A2 FE
Decryption Result: A$▒&▒]▒`

So it's seems more likely a gcc bug now?

noloader commented 5 years ago

So it's seems more likely a gcc bug now?

Maybe. Could be your code also. I have not seen your code so I can't say.

Maybe you can install GCC 5 or GCC 6 and make a pass with Undefined Behavior sanitizer.

asbai commented 5 years ago

Could be your code also.

I don't think so, because:

  1. It is a very simple test, just with few lines of code.
  2. The exactly same code running well under x86 and the pure c++ version also works well under linux/arm.

Maybe you can install GCC 5 or GCC 6 and make a pass with Undefined Behavior sanitizer.

Upgrade toolchain is painful :-( But we will give it a try anyway~

noloader commented 5 years ago
  • It is a very simple test, just with few lines of code.

  • The exactly same code running well under x86 and the pure c++ version also works well under linux/arm.

Try with GCC 4.9. Its on the dev-board.

asbai commented 5 years ago

Try with GCC 4.9. Its on the dev-board.

It’s time to go to bed, will try it later :-)

noloader commented 5 years ago

This works for me on the Wandboard. It is the first SPECK128/CTR test vector. It was compiled with GCC 4.9.2. A copy of the program is in your home directory.

cryptopp$ cat test.cxx
#include "cryptlib.h"
#include "speck.h"
#include "files.h"
#include "modes.h"
#include "hex.h"

#include <string>
#include <iostream>

std::string key =
  "EB78929D56C5D0F4 9F75C1C8694AFDA4 96F808E5524CC592 FD980EEC363E5382";
std::string iv =
  "AF322427139F48EC D9C779FB768B28C0";
std::string plain =
  "5DCC0F8301F37774 1D8517CD6DA27D1E 8170109C361FC68D 8E5022CE3DA1C8FB"
  "DD37F50A7DCB5BB1 A059D7CF100A94EA B59ECE9260072B64 B746FD9ED93F49F0"
  "C654CD5DFD321E4D 8959CBD1095FE2B5 D4ACC78474AC2232 EC34ECA0737F8886"
  "E156866C279C679C BAF544E7F69E3E8C 710DE732C1F8FEED F20773C6A674D706"
  "BF7EF2B5AD9C3D7B 00A734E0919F181B 59FBC8CFCEA4BEE9 525894814CE42122"
  "E3AD0F07C0AD56E8 99D780C5D1759312 351057C15F550C96 5379BADFC728E8F3";
std::string cipher =
  "46DDAF62B84BDB9B 97E2C114417AA5BD 126132D922FA7B2C 8B25A0D3147C9C1B"
  "203F06C546A7DAE2 0E98BC95AC6CB124 C0E7C51566B5D106 1EE61E25405E51CD"
  "6EB848C673E21BC1 0F9B9DCC36DDC162 81F1369D9DFAABDB 43C50C7CE10BF980"
  "5596D8C3CA530810 C1BE9EBDCC0950BD A5A0CEB196B8E81C D167809481E61E46"
  "B7748C8B88D71693 52C41761C0536286 A61619BBB841E19E A7867DE51DAAD870"
  "80C317A4B269AE8D 5BD6087E61BF9EE7 65DC69619E87FBC1 AB1E26EA9CF9EA42";

int main(int argc, char* argv[])
{
  using namespace CryptoPP;

  std::string k,v,p,c,r,x1,x2;
  StringSource(key, true, new HexDecoder(new StringSink(k)));
  StringSource(iv, true, new HexDecoder(new StringSink(v)));
  StringSource(plain, true, new HexDecoder(new StringSink(p)));
  StringSource(cipher, true, new HexDecoder(new StringSink(c)));

  CTR_Mode<SPECK128>::Encryption enc;
  enc.SetKeyWithIV((const byte*)&k[0], k.size(),(const byte*)&v[0], v.size());
  x1 = p;
  enc.ProcessString((byte*)&x1[0], x1.size());

  CTR_Mode<SPECK128>::Decryption dec;
  dec.SetKeyWithIV((const byte*)&k[0], k.size(),(const byte*)&v[0], v.size());
  x2 = x1;
  dec.ProcessString((byte*)&x2[0], x2.size());

  if (x1 == c)
    std::cout << "Encryption OK" << std::endl;
  else
    std::cout << "Encryption FAIL" << std::endl;

  if (x2 == p)
    std::cout << "Decryption OK" << std::endl;
  else
    std::cout << "Decryption FAIL" << std::endl;

  return 0;
}

Running it results in:

cryptopp$ g++ test.cxx ./libcryptopp.a -o test.exe
cryptopp$ ./test.exe
Encryption OK
Decryption OK
noloader commented 5 years ago

Closing this issue out. Bai should have what he needs to determine the problem.

asbai commented 5 years ago

Hi @noloader, I've been upgraded our toolchain to "gcc-linaro-arm-linux-gnueabihf-4.9-2014.09", And the NEON code working well now.

The only problem is that our Linux environment does not support 4.9 well: /usr/lib/libstdc++.so.6: version CXXABI_1.3.8' not found (required by ./test_speck_arm_neon) so we must link the libstdc++ statically.

Do you think we need to add some guard macro to #undef CRYPTOPP_ARM_NEON_AVAILABLE when the gcc version is lower than 4.7, like:

#if defined(__GNUC__) && CRYPTOPP_GCC_VERSION <= 40700
#   undef CRYPTOPP_ARM_NEON_AVAILABLE
#endif
noloader commented 5 years ago

I've been upgraded our toolchain to "gcc-linaro-arm-linux-gnueabihf-4.9-2014.09", And the NEON code working well now.

Yeah, that sucks. Sorry to hear it. It has bitten us more than once.

If it is any consolation, the first one is the toughest. Once you get past the first one you learn to spot the problem quickly and isolate the problem to the compiler.

Do you think we need to add some guard macro to #undef CRYPTOPP_ARM_NEON_AVAILABLE when the gcc version is lower than 4.7, like

You can if you like.

Or you can disable it just for SPECK by defining CRYPTOPP_BUGGY_SIMD_LOAD_AND_STORE. CRYPTOPP_BUGGY_SIMD_LOAD_AND_STORE is new (either 8.0 or 8.1; see config.h), and it should be more surgical or precise than CRYPTOPP_ARM_NEON_AVAILABLE.

As an aside, compilers are so different that it is really tough to decide when CRYPTOPP_ARM_NEON_AVAILABLE. ARM is not like x86, so we can't add -mfpu=neon based on a version number at will. We are working with Debian due to a compile failure on armel. Debian's armel compiler does not supply any NEON at the moment (hard or soft floats).

asbai commented 5 years ago

OK, did you mean we should add following macros to the head of speck.cpp, speck64_simd.cpp and speck128_simd.cpp:

#if defined(__GNUC__) && CRYPTOPP_GCC_VERSION <= 40700
#   define CRYPTOPP_BUGGY_SIMD_LOAD_AND_STORE
#endif

Do you need a pull request for this?

noloader commented 5 years ago

OK, did you mean we should add following macros to the head of speck.cpp, speck64_simd.cpp and speck128_simd.cpp:

No, you can define CRYPTOPP_BUGGY_SIMD_LOAD_AND_STORE or define CRYPTOPP_DISABLE_SPECK_SIMD in config.h.

Also see speck.h. speck.h uses CRYPTOPP_DISABLE_SPECK_SIMD to disable the SIMD code.

You might also be interested in Issue 782 and Commit e86a6b32eb56.

noloader commented 5 years ago

@asbai, Regarding:

The only problem is that our Linux environment does not support 4.9 well: /usr/lib/libstdc++.so.6: version CXXABI_1.3.8' not found (required by ./test_speck_arm_neon) so we must link the libstdc++ statically.

See version 'CXXABI_1.3.8' not found (required by …) on Stack Overflow. It sounds like you need to distribute both test_speck_arm_neon and libstdc++.so.6. Then, use LD_LIBRARY_PATH to use the distributed libstdc++.so.6. I've never been a fan of versioning like that.

asbai commented 5 years ago

No, you can define CRYPTOPP_BUGGY_SIMD_LOAD_AND_STORE or define CRYPTOPP_DISABLE_SPECK_SIMD in config.h.

OK, Thanks. But I'm not found CRYPTOPP_DISABLE_SPECK_SIMD in speck.h. They don't exist in the official zip version of 8.0, right?

See version 'CXXABI_1.3.8' not found (required by …) on Stack Overflow. It sounds like you need to distribute both test_speck_arm_neon and libstdc++.so.6. Then, use LD_LIBRARY_PATH to use the distributed libstdc++.so.6. I've never been a fan of versioning like that.

Yeah, but it’s more convenient to statically link them (-static-libgcc -static-libstdc++) to the target image. :-)

noloader commented 5 years ago

I'm trying to use -O2 and -O to compile the speck.cpp and speck128_simd.cpp, but nothing changed.

For got to mention... -O is -O2. You might try -O1 or -O0. We sometimes use -O0 on a source file until we find a cleaner work-around. Most recently was rijndael.cpp and rijndael_simd.cpp under new Microsoft compilers. Also see Issue 649 and Commit f57df06c5e6d.

Did you retest ChaCha since switching compilers?

asbai commented 5 years ago

OK, Thanks :-)

noloader commented 5 years ago

@asbai,

I get that you are building your program using GCC 4.6 or 4.7. However, you should still test with other compilers and tools. For example, you should have a test machine setup with GCC 5.0 or above so you can test you program with UBsan and Asan. And you should have a machine somewhere with GCC 6.0 to test with CET. And of course you should be testing under Valgrind and Visual Studio with /analyze. No one tool picks up all issues, so you have to throw a bunch of tools at it to keep code quality high.

We eat our own dogfood. We run our test script from hell on a regular basis. On a modern desktop it takes 3 or 4 hours to run. On an IoT device like a dev-board it takes 3 or 4 days to run. Also see Release Process | Analysis Tools on the Crypto++ wiki.