weidai11 / cryptopp

free C++ class library of cryptographic schemes
https://cryptopp.com
Other
4.66k stars 1.47k forks source link

HC-256 Inconsistent with Other Libraries #1280

Open Demonslay335 opened 1 week ago

Demonslay335 commented 1 week ago

While developing support for an external app that uses the HC-256 cipher, I noticed inconsistencies with the CryptoPP::HC256 algorithm's output. Further digging showed that when either the key or IV contain non-repeating bytes (e.g. not just all 00 or 36...), the output does not match that of other libraries, or the original reference library.

I have tested against BouncyCastle and the original hc256.c ECRYPT source (which I had to use instead of CryptoPP for this project).

Compiled CryptoPP 8.9.0 using Visual Studio (x64) on Windows 10.

I have found the problem seems to be with the endian flip happening with the key and IV at these two locations. Such an endian flip is not present in BouncyCastle or hc256.c.

https://github.com/weidai11/cryptopp/blob/60f81a77e0c9a0e7ffc1ca1bc438ddfa2e43b78e/hc256.cpp#L107-L111

https://github.com/weidai11/cryptopp/blob/60f81a77e0c9a0e7ffc1ca1bc438ddfa2e43b78e/hc256.cpp#L135-L139

If I replace the loop bodies with these respective changes, the unit output matches that of BouncyCastle and hc256.c.

m_key[i >> 2] |= ((word32)userKey[i] << (8 * (i & 0x3)));

...

m_iv[i >> 2] |= ((word32)iv[i] << (8 * (i & 0x3)));

Here is a test program that uses a BouncyCastle test vector to show the mismatch. The code throws an assertion with CryptoPP 8.9.0, and passes with the above changes.

#include <cryptopp/hc256.h>

int main()
{
  auto key = std::vector<uint8_t>{
      0x00, 0x53, 0xA6, 0xF9, 0x4C, 0x9F, 0xF2, 0x45, 0x98, 0xEB, 0x3E, 0x91, 0xE4, 0x37, 0x8A, 0xDD,
      0x30, 0x83, 0xD6, 0x29, 0x7C, 0xCF, 0x22, 0x75, 0xC8, 0x1B, 0x6E, 0xC1, 0x14, 0x67, 0xBA, 0x0D
  };
  auto iv = std::vector<uint8_t>{
      0x0D, 0x74, 0xDB, 0x42, 0xA9, 0x10, 0x77, 0xDE, 0x45, 0xAC, 0x13, 0x7A, 0xE1, 0x48, 0xAF, 0x16,
      0x7D, 0xE4, 0x4B, 0xB2, 0x19, 0x80, 0xE7, 0x4E, 0xB5, 0x1C, 0x83, 0xEA, 0x51, 0xB8, 0x1F, 0x86
  };
  auto buffer = std::vector<uint8_t>(0x40, 0x00);

  // BouncyCastle Set 6, vector# 0
  // https://github.com/bcgit/bc-java/blob/0520ebcd3291d6d3176ea25c571f1c7e0a963847/core/src/test/java/org/bouncycastle/crypto/test/HCFamilyTest.java#L104-L112
  // https://github.com/bcgit/bc-csharp/blob/f5432325fbeadade37ec2542a44b3702395c7650/crypto/test/data/hc256/hc256/ecrypt_HC-256_256K_256IV.txt#L3151-L3159
  auto expected = std::vector<uint8_t>{
      0x23, 0xD9, 0xE7, 0x0A, 0x45, 0xEB, 0x01, 0x27, 0x88, 0x4D, 0x66, 0xD9, 0xF6, 0xF2, 0x3C, 0x01,
      0xD1, 0xF8, 0x8A, 0xFD, 0x62, 0x92, 0x70, 0x12, 0x72, 0x47, 0x25, 0x6C, 0x1F, 0xFF, 0x91, 0xE9,
      0x1A, 0x79, 0x7B, 0xD9, 0x8A, 0xDD, 0x23, 0xAE, 0x15, 0xBE, 0xE6, 0xEE, 0xA3, 0xCE, 0xFD, 0xBF,
      0xA3, 0xED, 0x6D, 0x22, 0xD9, 0xC4, 0xF4, 0x59, 0xDB, 0x10, 0xC4, 0x0C, 0xDF, 0x4F, 0x4D, 0xFF
  };

  CryptoPP::HC256::Encryption hc256;
  hc256.SetKeyWithIV(key.data(), key.size(), iv.data(), iv.size());

  hc256.ProcessData(buffer.data(), buffer.data(), buffer.size());
  CRYPTOPP_ASSERT(CRYPTOPP_VERSION >= 890);
  CRYPTOPP_ASSERT(memcmp(buffer.data(), expected.data(), buffer.size()) == 0);
}

Furthermore, the sample on the wiki is not decryptable with BouncyCastle.

Key: CBCBA550FE1A5C34E0C4D6A14D056A08032EA11CF700BD7B9D409B68DC47B9B7
IV: 2E58581CFD129ACE30BD1F86739C713A867550A3DD6D9E0886F63C01B6F9531F
Plain: HC-256 stream cipher test

CryptoPP Cipher: 957BD81955B68A34A4C151D73F89E837945C364CE5B2942CC2
BouncyCastle Cipher: 3531AD8ADC1DEFED1C9D3B705D8B9E87063D1D05E40C19ADB2

The CryptoPP::HC128 algorithm appears fine; it also does not do endian flipping of the key and IV, instead explicitly using GetUserKey(LITTLE_ENDIAN_ORDER ...).

noloader commented 1 week ago

Thanks @Demonslay335.

A few thoughts...

We test Crypto++ on both little-endian and big-endian systems. HC-128 and HC-256 are both passing on them. If we had an endian issue, I would expect it to surface.

I try to use reference implementations to generate test vectors. Here is the one I used for HC-256: https://github.com/noloader/cryptopp-test/tree/master/HC-256. (It is my GitHub, where I store reference implementations for posterity). hc-256.c was modified to output the bytes for the key, iv, plaintext and ciphertext. Otherwise, no other changes were made. ECRYPT_keysetup, ECRYPT_ivsetup and ECRYPT_process_bytes are called. Nothing else is called (other than print functions).

The reference implementation includes a file called testvectors.txt. The first one is all 0's, and it is kind of boring. The second and third ones are the interesting ones.

1.  If every byte of the key and IV are with value 0, 
    then the first 32 bytes of the keystream are given as:

    5b    07    89    85    d8    f6    f3    0d    
    42    c5    c0    2f    a6    b6    79    51    
    53    f0    65    34    80    1f    89    f2    
    4e    74    24    8b    72    0b    48    18

2. If every byte of the key and IV are with value 0, 
   except that IV[0] = 1, then the first 32 bytes of the 
   keystream are given as:

   af    e2    a2    bf    4f    17    ce    e9    
   fe    c2    05    8b    d1    b1    8b    b1    
   5f    c0    42    ee    71    2b    31    01    
   dd    50    1f    c6    0b    08    2a    50

3. If every byte of the key and IV are with value 0, 
   except that key[0] = 0x55, then the first 32 bytes of the 
   keystream are given as:

   1c    40    4a    fe    4f    e2    5f    ed    
   95    8f    9a    d1    ae    36    c0    6f    
   88    a6    5a    3c    c0    ab    e2    23    
   ae    b3    90    2f    42    0e    d3    a8

In Crypto++, the tests are at located at:

I only see the first one in BouncyCastle. I do not see the second or third in the BouncyCastle tests. Can you run the test vectors using BouncyCastle?


I know David, and he and I have met in real life. We try to stay consistent on implementations (ECIES was a hairy one). So it will be interesting to see where we differ.

Demonslay335 commented 1 week ago

Ok, there is something very odd going on here then... let me explain.

In short, it appears the HC-256 author's PDF description does not match his own C source code listed on the same page.

(Phases 1-3 all contain the same PDF and C code zip, so it's not some draft/version mismatch).

In the PDF, under the section "B The optimized C implementation of HC-256 (“hc256.h”)" on Page 20, there is the following code:

void initialization(uint32 key[], uint32 iv[])
{
uint32 i,j;
//expand the key and iv into P and Q
for (i = 0; i < 8; i++) P[i] = key[i];
for (i = 8; i < 16; i++) P[i] = iv[i-8];
...

Notice how there are no rotations.

Also, my mistake, I did not use the direct ECRYPT version, but actually used this library that essentially does the same as the PDF version: https://github.com/peterferrie/hc256

So, it may come down to each library having derived off of either the C code in the PDF specification, or the ECRYPT ecrypt-sync.h implementation... who is right?

I have also looked at a common Rust implementation and a few Python implementations, and they all preserve little-endian for the key and IV.

Honestly so far, CryptoPP has been the only library that differs from what I can tell. But also, not many mainstream crypto libraries even support the algorithm to begin with...


As extra context to my original post, the "external app" is actually a ransomware using the HC-256 algorithm. I was developing a decryptor in our C++ application, and noticed the output from CryptoPP::HC256 did not match that of the malware's implementation. Switching to the library I mentioned above worked.

I originally did not mention it being malware, as I did not want you to discredit it as a potentially "customized" version of the algorithm or something (that does happen, rarely). The malware's implementation completely matches output from BouncyCastle, which is how I originally identified and tested the algorithm during my reverse engineering.


Concerning the unit tests, basically every test labeled from your "hc-256.c reference implementation" does not match BouncyCastle. I spot tested a couple dozen.

To easily test yourself (without compiling anything), you can try my CryptoTester program. Internally, it uses BouncyCastle C#'s HC256Engine, which I have verified is functionally the same code as the Java counterpart; which leaves the little-endian. This block is what I naively replaced in the CryptoPP loops mentioned above to make it work.

https://github.com/bcgit/bc-java/blob/0520ebcd3291d6d3176ea25c571f1c7e0a963847/core/src/main/java/org/bouncycastle/crypto/engines/HC256Engine.java#L105-L113 https://github.com/bcgit/bc-csharp/blob/f5432325fbeadade37ec2542a44b3702395c7650/crypto/src/crypto/engines/HC256Engine.cs#L100-L101

You can also conversely try basically any test vector from BouncyCastle's Set 6 in CryptoPP, just like my original bug PoC. (The C# project's vectors are much easier to read).

I should also mention that CryptoPP and BouncyCastle do match not only when the key/IV are repeating bytes, but also if only the first byte is modified; aka the 3 original reference tests do work everywhere. It is very frustrating that they are so lacking, allowing this ambiguity to exist in the first place... literally any other algorithm I've worked with has some standardized test vectors of non zero keys for this very reason...

noloader commented 1 week ago

Thanks for the work @Demonslay335,

it appears the HC-256 author's PDF description does not match his own C source code listed on the same page.

Yeah, I've seen this before. For example, SIMON and SPECK did the same thing. In the case of SIMON and SPECK, we worked with the authors to get things to line up (the paper and ref implementation).

So, it may come down to each library having derived off of either the C code in the PDF specification, or the ECRYPT ecrypt-sync.h implementation... who is right?

I would hazard a guess and say both are right (as unfortunate as that is). In Crypto++, we used the reference ECRYPT implementation because that's what we were aiming for. More specifically, we used eSTREAM Phase 3 implementation from https://www.ecrypt.eu.org/stream/hcp3.html.

We should get this documented in the wiki at https://www.cryptopp.com/wiki/HC-128#HC-256.

And ChaCha20 is another really bad example of interop problems. Crypto++ implemented ChaCha20 from Bernstein's paper and reference implementation. Then the IETF standardized ChaCha20 but changed the algorithm. So there are two versions of ChaCha20 floating around -- Bernstein's ChaCha20 and the IETF version that hijacked the ChaCha20 name.

noloader commented 1 week ago

@Demonslay335,

Can you provide a diff that makes Crypto++ work with BouncyCastle and friends? I'd like to think about how we can support both implementations.

Demonslay335 commented 1 week ago

@noloader Sure thing, here ya go. I matched the method of initializing the key and IV with that of HC128 for consistency. Building with this patch makes my above PoC pass, but it will of course break most of your unit tests.

hc256.zip

diff --git a/hc256.cpp b/hc256.cpp
index 3a93f21e..96bb9743 100644
--- a/hc256.cpp
+++ b/hc256.cpp
@@ -101,14 +101,7 @@ void HC256Policy::CipherSetKey(const NameValuePairs &params, const byte *userKey
    CRYPTOPP_UNUSED(params); CRYPTOPP_UNUSED(keylen);
    CRYPTOPP_ASSERT(keylen == 32);

-   for (unsigned int i = 0; i < 8; i++)
-       m_key[i] = 0;
-
-   for (unsigned int i = 0; i < 32; i++)
-   {
-       m_key[i >> 2] = m_key[i >> 2] | userKey[i];
-       m_key[i >> 2] = rotlConstant<8>(m_key[i >> 2]);
-   }
+   GetUserKey(LITTLE_ENDIAN_ORDER, m_key.begin(), 8, userKey, keylen);
 }

 void HC256Policy::OperateKeystream(KeystreamOperation operation, byte *output, const byte *input, size_t iterationCount)
@@ -127,18 +120,11 @@ void HC256Policy::CipherResynchronize(byte *keystreamBuffer, const byte *iv, siz
    CRYPTOPP_UNUSED(keystreamBuffer); CRYPTOPP_UNUSED(length);
    CRYPTOPP_ASSERT(length == 32);

-   /* initialize the iv */
-   word32 W[2560];
-   for (unsigned int i = 0; i < 8; i++)
-       m_iv[i] = 0;
-
-   for (unsigned int i = 0; i < 32; i++)
-   {
-       m_iv[i >> 2] = m_iv[i >> 2] | iv[i];
-       m_iv[i >> 2] = rotlConstant<8>(m_iv[i >> 2]);
-   }
+   /* initialize the iv */ 
+   GetUserKey(LITTLE_ENDIAN_ORDER, m_iv.begin(), 8, iv, length);

    /* setup the table P and Q */
+   word32 W[2560];

    for (unsigned int i = 0; i < 8; i++)
        W[i] = m_key[i];
-- 

My guess for moving forward would be to fix the current implementation, and copy the "old" one to a new HC256Ecrypt (or similarly named) policy. Or, if breaking previous code is too large of an issue here, move the "fixed" implementation to a new class; I'm just not as sure on what you'd call it to be honest.

I had no idea this would end up being another ChaCha vs ChaChaTLS situation...

noloader commented 1 week ago

Thanks @Demonslay335,

I vaguely remember I wanted to use this with HC-256, but I had to switch because of the test vectors produced by the reference implementation.

+   GetUserKey(LITTLE_ENDIAN_ORDER, m_key.begin(), 8, userKey, keylen);