Rabbit Produces null Keystream When inString == outString

weidai11 / cryptopp

free C++ class library of cryptographic schemes

https://cryptopp.com

Other

4.82k stars 1.49k forks source link

Rabbit Produces null Keystream When inString == outString #1231

Closed Demonslay335 closed 1 year ago

Demonslay335 commented 1 year ago

When upgrading from CryptoPP 8.6.0 to 8.8.0 (master branch via vcpkg), we started having our own unit tests fail with code using the CryptoPP::RabbitWithIV algorithm. Upon inspection, it appears when inString == outString (which is perfectly legal according to ProcessData), Rabbit is returning a buffer with all 0x00 bytes, instead of the encrypted/decrypted buffer.

When I dug into the method, it looks like RabbitWithIVPolicy::OperateKeystream ends up just XOR'ing the generated keystream with itself...

Here is a minimal example reproducing the issue with the Rabbit test vectors, Test 4. This code passes on CryptoPP 8.6.0, and throws an assertion on CryptoPP 8.8.0. This also applies to CryptoPP::Rabbit. Compiled using Visual Studio (x64) on Windows 10.

#include <cryptopp/rabbit.h>

int main()
{
    auto key = std::vector<uint8_t>(CryptoPP::RabbitWithIV::KEYLENGTH, 0x00);
    auto iv = std::vector<uint8_t>(CryptoPP::RabbitWithIV::IV_LENGTH, 0x00);
    auto buffer = std::vector<uint8_t>(0x20, 0x00);
    auto expected = std::vector<uint8_t>{ 0xED, 0xB7, 0x05, 0x67, 0x37, 0x5D, 0xCD, 0x7C, 0xD8, 0x95, 0x54, 0xF8, 0x5E, 0x27, 0xA7, 0xC6, 0x8D, 0x4A, 0xDC, 0x70, 0x32, 0x29, 0x8F, 0x7B, 0xD4, 0xEF, 0xF5, 0x04, 0xAC, 0xA6, 0x29, 0x5F };

    CryptoPP::RabbitWithIV::Encryption enc;
    enc.SetKeyWithIV(key.data(), key.size(), iv.data(), iv.size());

    enc.ProcessData(buffer.data(), buffer.data(), buffer.size());

    CRYPTOPP_ASSERT(memcmp(buffer.data(), expected.data(), expected.size()) == 0);

    return 0;
}

Just to note, we use a lot of CryptoPP algorithms in our codebase, and no other algorithms had this issue so far.

Seems this may be related to https://github.com/weidai11/cryptopp/issues/1010, perhaps a missed algorithm?

noloader commented 1 year ago

Ugh, sorry to hear that. GH #1010 gave us a lot of problems. I hope they are not surfacing again.

I can't duplicate on my test machines.

$ ./cryptest.exe tv rabbit
Using seed: 1695672009      

Testing SymmetricCipher algorithm Rabbit.
....................................................................................................................................................................................................................................................................
Testing SymmetricCipher algorithm RabbitWithIV.
....................................................................................................................................................................................................................................................................
Tests complete. Total tests = 520. Failed tests = 0.

Can you give the attached patch a try and report back. rabbit.diff.zip.

$ cat rabbit.diff
diff --git a/rabbit.cpp b/rabbit.cpp
index 4f655111..7bbdb1de 100644
--- a/rabbit.cpp
+++ b/rabbit.cpp
@@ -152,6 +152,10 @@ void RabbitPolicy::OperateKeystream(KeystreamOperation operation, byte *output,
        //  adding the input buffer and keystream.
        if ((operation & EnumToInt(INPUT_NULL)) != EnumToInt(INPUT_NULL))
                xorbuf(output, input, GetBytesPerIteration() * iterationCount);
+
+       // https://github.com/weidai11/cryptopp/issues/1231
+       volatile byte* unused = const_cast<volatile byte *>(output);
+       CRYPTOPP_UNUSED(unused);
 }

 void RabbitWithIVPolicy::CipherSetKey(const NameValuePairs &params, const byte *userKey, size_t keylen)
@@ -254,6 +258,10 @@ void RabbitWithIVPolicy::OperateKeystream(KeystreamOperation operation, byte *ou
        //  adding the input buffer and keystream.
        if ((operation & EnumToInt(INPUT_NULL)) != EnumToInt(INPUT_NULL))
                xorbuf(output, input, GetBytesPerIteration() * iterationCount);
+
+       // https://github.com/weidai11/cryptopp/issues/1231
+       volatile byte* unused = const_cast<volatile byte *>(output);
+       CRYPTOPP_UNUSED(unused);
 }

 NAMESPACE_END

Demonslay335 commented 1 year ago

Does the test suite do input == output for the buffers though? From looking at the code, I believe it is using separate buffers thru the sinks.

I've tried the patch with no change. OperateKeystream is still passed output == input, causing the xorbuf to cancel itself out.

With a little more testing, I confirmed if AdditiveCipherTemplate<S>::ProcessData is forced to use WriteKeystream instead of OperateKeystream on the policy (e.g. length < bytesPerIteration), then the keystream is correct.

I feel like it may have to do with the way Rabbit's keystream function here is implemented, it looks different from how ChaCha and other stream ciphers are implemented (from my quick looking). Inside the loop, the keystream is directly written to output, which also of course overwrites the input in this case. I'm trying to read further into why this isn't a problem in another stream cipher.

Demonslay335 commented 1 year ago

I've just reproduced this same issue with the HC128 and HC256 algorithms. They have OperateKeystream structured the same as Rabbit, so the same thing happens. A quick search for the code block doesn't show any other algorithms so far doing it that way.

I think this may be related more to the changes made in GH #1088, and these algorithms weren't adapted to account for the pointers being the same when being consumed. In 8.6.0, it looks like if they were the same, a temporary string was created to be passed instead, then copied back to. These stream ciphers are directly writing the keystream to the output pointer, treating it is more like a temporary buffer, instead of saving the keystream + input.

I notice other stream ciphers like ChaCha and Sosemanuk (that work in this case) use the CRYPTOPP_KEYSTREAM_OUTPUT_WORD and CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH macros, which could be related to a fix.

noloader commented 1 year ago

Hi @Demonslay335,

Does the test suite do input == output for the buffers though?

Probably not. That will be the first thing I fix.

I feel like it may have to do with the way Rabbit's keystream function here is implemented, it looks different from how ChaCha and other stream ciphers are implemented

Yeah, I believe you are correct.

I implemented Rabbit and HC-{128|256} classes. I don't recall why I did not use the same pattern as say, Salsa and ChaCha.

notice other stream ciphers like ChaCha and Sosemanuk (that work in this case) use the CRYPTOPP_KEYSTREAM_OUTPUT_WORD and CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH macros, which could be related to a fix.

Yeah, something needs fixing. I'll dig into it when I have some time.

I am terribly sorry about this.

noloader commented 1 year ago

@Demonslay335, @fwosar,

I provided the first check-in at 560d48f96861. It adds the self tests to exercise in-place encryption and decryption. Though the self tests use ProcessString(inoutString, length), under the hood ProcessData(inString, outString, length) is called.

And a lot more than Rabbit and HC-{128|256} broke. Ugh...

For CTR mode, I think the test code is crafted wrong. For CTR mode, I believe we need to call ProcessLastBlock(), and that is not happening. So that explains CTR mode. But the other breaks look valid (to me).

noloader commented 1 year ago

@Demonslay335, @fwosar,

I believe Rabbit is fixed with b157b4d30116. The fix was to cutover to CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH, like in Salsa and ChaCha.

I do not know why that fixes the issue. The code is equivalent. In fact, the old code was simpler than the code with the CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH. I expect the new code using CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH to run a tad bit slower than the old code.

My wild ass guess is, the macros thwart analysis by the compiler, so the code with the CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH remains in tact. I.e., the code is not removed by the compiler. The older, simpler code was easier to analyze, and the compiler removed calls to OperateKeystream due to aliasing violations in ProcessData.

I'll keep chipping away at the other failures.

noloader commented 1 year ago

@Demonslay335, @fwosar,

I believe HC-{128|256} are fixed with 0bf879883573. The fix was to cutover to CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH, like in Salsa and ChaCha.

noloader commented 1 year ago

I'm closing this issue since we cleared the problem with Rabbit, RabbitWithIV, HC128 and HC256.

I'm opening another bug for HIGHT/CTR mode.

Demonslay335 commented 1 year ago

@noloader Thanks for the quick fix.

I believe it actually has nothing to do with compiler in this case (luckily).

I found the main difference is in the CRYPTOPP_KEYSTREAM_OUTPUT_WORD macro, which uses the 5th argument to PutWord (xorBlock), whereas the old stand-alone code did not (defaulting to NULLPTR). So the keystream is properly XOR'd with the input before being written to the output buffer, as we want.

template <class T>
inline void PutWord(bool assumeAligned, ByteOrder order, byte *block, T value, const byte *xorBlock = NULLPTR)
{
    CRYPTOPP_UNUSED(assumeAligned);

    T t1, t2;
    t1 = ConditionalByteReverse(order, value);
    if (xorBlock != NULLPTR) {std::memcpy(&t2, xorBlock, sizeof(T)); t1 ^= t2;} // < -- BINGO
    if (block != NULLPTR) {std::memcpy(block, &t1, sizeof(T));}
}

To verify, adapting that parameter actually fixes the old code.

byte* out = output;
for (unsigned int i = 0; i<iterationCount; ++i, out += 16)
{
    /* Iterate the system */
    m_wcy = NextState(m_wc, m_wx, m_wcy);

    /* Encrypt/decrypt 16 bytes of data */
    PutWord(false, LITTLE_ENDIAN_ORDER, out +  0, m_wx[0] ^ (m_wx[5] >> 16) ^ (m_wx[3] << 16), input + i * sizeof(WordType));
    PutWord(false, LITTLE_ENDIAN_ORDER, out +  4, m_wx[2] ^ (m_wx[7] >> 16) ^ (m_wx[5] << 16), input + i * sizeof(WordType));
    PutWord(false, LITTLE_ENDIAN_ORDER, out +  8, m_wx[4] ^ (m_wx[1] >> 16) ^ (m_wx[7] << 16), input + i * sizeof(WordType));
    PutWord(false, LITTLE_ENDIAN_ORDER, out + 12, m_wx[6] ^ (m_wx[3] >> 16) ^ (m_wx[1] << 16), input + i * sizeof(WordType));
}

Use of the macros is far more consistent with the rest of the library though, and using the switch statement macro properly accounts for all the operation values.