weidai11 / cryptopp

free C++ class library of cryptographic schemes
https://cryptopp.com
Other
4.8k stars 1.49k forks source link

ChaCha Algorithm changed? #807

Closed asbai closed 5 years ago

asbai commented 5 years ago

Crypto++ 8.0 Issue Report

We have some disk data encrypted by Crypto++ 7.0 with ChaCha8, We use the same 8 rounds ChaCha algorithm provided by 8.0 to decrypt it with exactly same key and iv, but the result is wrong.

We have try a few other algorithms, both encrypted using 7.0, then decrypt them use the 8.0, they all working fine. So only ChaCha failed, is there something changed (on the algorithm level)?

Compiler: VC2005; Target: MSW/x86-32

asbai commented 5 years ago

Oh, I've changed the chacha.h and chacha.cpp a little, but I think it isn't affect the algorithm at all.

chacha.h:

// chacha.h - written and placed in the public domain by Jeffrey Walton.
//            Based on Wei Dai's Salsa20, Botan's SSE2 implementation,
//            and Bernstein's reference ChaCha family implementation at
//            http://cr.yp.to/chacha.html.

/// \file chacha.h
/// \brief Classes for ChaCha8, ChaCha12 and ChaCha20 stream ciphers
/// \details Crypto++ provides Bernstein and ECRYPT's ChaCha from <a href="http://cr.yp.to/chacha/chacha-20080128.pdf">ChaCha,
///   a variant of Salsa20</a> (2008.01.28). Bernstein's implementation is _slightly_ different from the TLS working group's
///   implementation for cipher suites <tt>TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256</tt>,
///   <tt>TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256</tt>, and <tt>TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256</tt>.
/// \since Crypto++ 5.6.4

#ifndef CRYPTOPP_CHACHA_H
#define CRYPTOPP_CHACHA_H

#include "strciphr.h"
#include "secblock.h"

NAMESPACE_BEGIN(CryptoPP)

/// \brief ChaCha stream cipher information
/// \since Crypto++ 5.6.4
struct ChaCha_Info : public VariableKeyLength<32, 16, 32, 16, SimpleKeyingInterface::UNIQUE_IV, 8>
{
    /// \brief The algorithm name
    /// \returns the algorithm name
    /// \details StaticAlgorithmName returns the algorithm's name as a static
    ///   member function.
    /// \details Bernstein named the cipher variants ChaCha8, ChaCha12 and
    ///   ChaCha20. More generally, Bernstein called the family ChaCha{r}.
    ///   AlgorithmName() provides the exact name once rounds are set.
    static const char* StaticAlgorithmName() {
        return "ChaCha";
    }
};

// [[ by BaiYang - 保留老版本中,将 rounds 作为模板参数(而不是属性)的实现方式
/// \brief ChaCha stream cipher implementation
/// \since Crypto++ 5.6.4
template < unsigned int rounds >
class CRYPTOPP_NO_VTABLE ChaCha_Policy : public AdditiveCipherConcretePolicy<word32, 16>
{
public:
    ~ChaCha_Policy() {}
    ChaCha_Policy() {}

protected:
    void CipherSetKey(const NameValuePairs &params, const byte *key, size_t length);
    void OperateKeystream(KeystreamOperation operation, byte *output, const byte *input, size_t iterationCount);
    void CipherResynchronize(byte *keystreamBuffer, const byte *IV, size_t length);
    bool CipherIsRandomAccess() const {return true;}
    void SeekToIteration(lword iterationCount);
    unsigned int GetAlignment() const;
    unsigned int GetOptimalBlockSize() const;

    std::string AlgorithmName() const;
    std::string AlgorithmProvider() const;

    // MultiBlockSafe detects a condition that can arise in the SIMD
    // implementations where we overflow one of the 32-bit state words
    // during addition in an intermediate result. Conditions to trigger
    // issue include a user seeks to around 2^32 blocks (256 GB of data).
    // https://github.com/weidai11/cryptopp/issues/732
    inline bool MultiBlockSafe(unsigned int blocks) const;

    FixedSizeAlignedSecBlock<word32, 16> m_state;
};

/// \brief ChaCha stream cipher
/// \details Bernstein and ECRYPT's ChaCha is _slightly_ different from the TLS working
///   group's implementation for cipher suites
///   <tt>TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256</tt>,
///   <tt>TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256</tt>, and
///   <tt>TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256</tt>.
/// \sa <a href="http://cr.yp.to/chacha/chacha-20080208.pdf">ChaCha, a variant of Salsa20</a> (2008.01.28).
/// \since Crypto++ 5.6.4
struct ChaCha8 : public ChaCha_Info, public SymmetricCipherDocumentation
{
    typedef SymmetricCipherFinal<ConcretePolicyHolder<ChaCha_Policy<8>, AdditiveCipherTemplate<> >, ChaCha_Info > Encryption;
    typedef Encryption Decryption;
};

struct ChaCha12 : public ChaCha_Info, public SymmetricCipherDocumentation
{
    typedef SymmetricCipherFinal<ConcretePolicyHolder<ChaCha_Policy<12>, AdditiveCipherTemplate<> >, ChaCha_Info > Encryption;
    typedef Encryption Decryption;
};

struct ChaCha20 : public ChaCha_Info, public SymmetricCipherDocumentation
{
    typedef SymmetricCipherFinal<ConcretePolicyHolder<ChaCha_Policy<20>, AdditiveCipherTemplate<> >, ChaCha_Info > Encryption;
    typedef Encryption Decryption;
};
// ]] by BaiYang - 保留老版本中,将 rounds 作为模板参数(而不是属性)的实现方式

NAMESPACE_END

#endif  // CRYPTOPP_CHACHA_H

chacha.cpp:

// chacha.cpp - written and placed in the public domain by Jeffrey Walton.
//              Based on Wei Dai's Salsa20, Botan's SSE2 implementation,
//              and Bernstein's reference ChaCha family implementation at
//              http://cr.yp.to/chacha.html.

#include "pch.h"
#include "config.h"
#include "chacha.h"
#include "argnames.h"
#include "misc.h"
#include "cpu.h"

NAMESPACE_BEGIN(CryptoPP)

#if (CRYPTOPP_ARM_NEON_AVAILABLE)
extern void ChaCha_OperateKeystream_NEON(const word32 *state, const byte* input, byte *output, unsigned int rounds);
#endif

#if (CRYPTOPP_SSE2_INTRIN_AVAILABLE || CRYPTOPP_SSE2_ASM_AVAILABLE)
extern void ChaCha_OperateKeystream_SSE2(const word32 *state, const byte* input, byte *output, unsigned int rounds);
#endif

#if (CRYPTOPP_AVX2_AVAILABLE)
extern void ChaCha_OperateKeystream_AVX2(const word32 *state, const byte* input, byte *output, unsigned int rounds);
#endif

#if (CRYPTOPP_POWER7_AVAILABLE)
extern void ChaCha_OperateKeystream_POWER7(const word32 *state, const byte* input, byte *output, unsigned int rounds);
#elif (CRYPTOPP_ALTIVEC_AVAILABLE)
extern void ChaCha_OperateKeystream_ALTIVEC(const word32 *state, const byte* input, byte *output, unsigned int rounds);
#endif

#define CHACHA_QUARTER_ROUND(a,b,c,d) \
    a += b; d ^= a; d = rotlConstant<16,word32>(d); \
    c += d; b ^= c; b = rotlConstant<12,word32>(b); \
    a += b; d ^= a; d = rotlConstant<8,word32>(d); \
    c += d; b ^= c; b = rotlConstant<7,word32>(b);

#define CHACHA_OUTPUT(x){\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 0, x0 + m_state[0]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 1, x1 + m_state[1]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 2, x2 + m_state[2]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 3, x3 + m_state[3]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 4, x4 + m_state[4]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 5, x5 + m_state[5]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 6, x6 + m_state[6]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 7, x7 + m_state[7]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 8, x8 + m_state[8]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 9, x9 + m_state[9]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 10, x10 + m_state[10]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 11, x11 + m_state[11]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 12, x12 + m_state[12]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 13, x13 + m_state[13]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 14, x14 + m_state[14]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 15, x15 + m_state[15]);}

// [[ by BaiYang - 保留老版本中,将 rounds 作为模板参数(而不是属性)的实现方式
#if defined(CRYPTOPP_DEBUG) && !defined(CRYPTOPP_DOXYGEN_PROCESSING)
void ChaCha_TestInstantiations()
{
    ChaCha20::Encryption x;
}
#endif

template < unsigned int rounds >
std::string ChaCha_Policy<rounds>::AlgorithmName() const
{
    return std::string("ChaCha")+IntToString(rounds);
}

template < unsigned int rounds >
std::string ChaCha_Policy<rounds>::AlgorithmProvider() const
{
#if (CRYPTOPP_AVX2_AVAILABLE)
    if (HasAVX2())
        return "AVX2";
    else
#endif
#if (CRYPTOPP_SSE2_INTRIN_AVAILABLE || CRYPTOPP_SSE2_ASM_AVAILABLE)
    if (HasSSE2())
        return "SSE2";
    else
#endif
#if (CRYPTOPP_ARM_NEON_AVAILABLE)
    if (HasNEON())
        return "NEON";
    else
#endif
#if (CRYPTOPP_POWER7_AVAILABLE)
    if (HasPower7())
        return "Power7";
    else
#elif (CRYPTOPP_ALTIVEC_AVAILABLE)
    if (HasAltivec())
        return "Altivec";
    else
#endif
    return "C++";
}

template < unsigned int rounds >
void ChaCha_Policy<rounds>::CipherSetKey(const NameValuePairs &params, const byte *key, size_t length)
{
    CRYPTOPP_UNUSED(params);
    CRYPTOPP_ASSERT(length == 16 || length == 32);

    // "expand 16-byte k" or "expand 32-byte k"
    m_state[0] = 0x61707865;
    m_state[1] = (length == 16) ? 0x3120646e : 0x3320646e;
    m_state[2] = (length == 16) ? 0x79622d36 : 0x79622d32;
    m_state[3] = 0x6b206574;

    GetBlock<word32, LittleEndian> get1(key);
    get1(m_state[4])(m_state[5])(m_state[6])(m_state[7]);

    GetBlock<word32, LittleEndian> get2(key + ((length == 32) ? 16 : 0));
    get2(m_state[8])(m_state[9])(m_state[10])(m_state[11]);
}

template < unsigned int rounds >
void ChaCha_Policy<rounds>::CipherResynchronize(byte *keystreamBuffer, const byte *IV, size_t length)
{
    CRYPTOPP_UNUSED(keystreamBuffer), CRYPTOPP_UNUSED(length);
    CRYPTOPP_ASSERT(length==8);

    GetBlock<word32, LittleEndian> get(IV);
    m_state[12] = m_state[13] = 0;
    get(m_state[14])(m_state[15]);
}

template < unsigned int rounds >
void ChaCha_Policy<rounds>::SeekToIteration(lword iterationCount)
{
    m_state[12] = (word32)iterationCount;  // low word
    m_state[13] = (word32)SafeRightShift<32>(iterationCount);
}

template < unsigned int rounds >
unsigned int ChaCha_Policy<rounds>::GetAlignment() const
{
#if (CRYPTOPP_AVX2_AVAILABLE)
    if (HasAVX2())
        return 16;
    else
#endif
#if (CRYPTOPP_SSE2_INTRIN_AVAILABLE || CRYPTOPP_SSE2_ASM_AVAILABLE)
    if (HasSSE2())
        return 16;
    else
#endif
#if (CRYPTOPP_ALTIVEC_AVAILABLE)
    if (HasAltivec())
        return 16;
    else
#endif
        return GetAlignmentOf<word32>();
}

template < unsigned int rounds >
unsigned int ChaCha_Policy<rounds>::GetOptimalBlockSize() const
{
#if (CRYPTOPP_AVX2_AVAILABLE)
    if (HasAVX2())
        return 8 * BYTES_PER_ITERATION;
    else
#endif
#if (CRYPTOPP_SSE2_INTRIN_AVAILABLE || CRYPTOPP_SSE2_ASM_AVAILABLE)
    if (HasSSE2())
        return 4*BYTES_PER_ITERATION;
    else
#endif
#if (CRYPTOPP_ARM_NEON_AVAILABLE)
    if (HasNEON())
        return 4*BYTES_PER_ITERATION;
    else
#endif
#if (CRYPTOPP_ALTIVEC_AVAILABLE)
    if (HasAltivec())
        return 4*BYTES_PER_ITERATION;
    else
#endif
        return BYTES_PER_ITERATION;
}

template < unsigned int rounds >
bool ChaCha_Policy<rounds>::MultiBlockSafe(unsigned int blocks) const
{
    return 0xffffffff - m_state[12] > blocks;
}

// OperateKeystream always produces a key stream. The key stream is written
// to output. Optionally a message may be supplied to xor with the key stream.
// The message is input, and output = output ^ input.
template < unsigned int rounds >
void ChaCha_Policy<rounds>::OperateKeystream(KeystreamOperation operation,
        byte *output, const byte *input, size_t iterationCount)
{
    do
    {
#if (CRYPTOPP_AVX2_AVAILABLE)
        if (HasAVX2())
        {
            while (iterationCount >= 8 && MultiBlockSafe(8))
            {
                const bool xorInput = (operation & INPUT_NULL) != INPUT_NULL;
                ChaCha_OperateKeystream_AVX2(m_state, xorInput ? input : NULLPTR, output, rounds);

                // MultiBlockSafe avoids overflow on the counter words
                m_state[12] += 8;
                //if (m_state[12] < 8)
                //    m_state[13]++;

                input += (!!xorInput) * 8 * BYTES_PER_ITERATION;
                output += 8 * BYTES_PER_ITERATION;
                iterationCount -= 8;
            }
        }
#endif

#if (CRYPTOPP_SSE2_INTRIN_AVAILABLE || CRYPTOPP_SSE2_ASM_AVAILABLE)
        if (HasSSE2())
        {
            while (iterationCount >= 4 && MultiBlockSafe(4))
            {
                const bool xorInput = (operation & INPUT_NULL) != INPUT_NULL;
                ChaCha_OperateKeystream_SSE2(m_state, xorInput ? input : NULLPTR, output, rounds);

                // MultiBlockSafe avoids overflow on the counter words
                m_state[12] += 4;
                //if (m_state[12] < 4)
                //    m_state[13]++;

                input += (!!xorInput)*4*BYTES_PER_ITERATION;
                output += 4*BYTES_PER_ITERATION;
                iterationCount -= 4;
            }
        }
#endif

#if (CRYPTOPP_ARM_NEON_AVAILABLE)
        if (HasNEON())
        {
            while (iterationCount >= 4 && MultiBlockSafe(4))
            {
                const bool xorInput = (operation & INPUT_NULL) != INPUT_NULL;
                ChaCha_OperateKeystream_NEON(m_state, xorInput ? input : NULLPTR, output, rounds);

                // MultiBlockSafe avoids overflow on the counter words
                m_state[12] += 4;
                //if (m_state[12] < 4)
                //    m_state[13]++;

                input += (!!xorInput)*4*BYTES_PER_ITERATION;
                output += 4*BYTES_PER_ITERATION;
                iterationCount -= 4;
            }
        }
#endif

#if (CRYPTOPP_POWER7_AVAILABLE)
        if (HasPower7())
        {
            while (iterationCount >= 4 && MultiBlockSafe(4))
            {
                const bool xorInput = (operation & INPUT_NULL) != INPUT_NULL;
                ChaCha_OperateKeystream_POWER7(m_state, xorInput ? input : NULLPTR, output, rounds);

                // MultiBlockSafe avoids overflow on the counter words
                m_state[12] += 4;
                //if (m_state[12] < 4)
                //    m_state[13]++;

                input += (!!xorInput)*4*BYTES_PER_ITERATION;
                output += 4*BYTES_PER_ITERATION;
                iterationCount -= 4;
            }
        }
#elif (CRYPTOPP_ALTIVEC_AVAILABLE)
        if (HasAltivec())
        {
            while (iterationCount >= 4 && MultiBlockSafe(4))
            {
                const bool xorInput = (operation & INPUT_NULL) != INPUT_NULL;
                ChaCha_OperateKeystream_ALTIVEC(m_state, xorInput ? input : NULLPTR, output, rounds);

                // MultiBlockSafe avoids overflow on the counter words
                m_state[12] += 4;
                //if (m_state[12] < 4)
                //    m_state[13]++;

                input += (!!xorInput)*4*BYTES_PER_ITERATION;
                output += 4*BYTES_PER_ITERATION;
                iterationCount -= 4;
            }
        }
#endif

        if (iterationCount)
        {
            word32 x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15;

            x0 = m_state[0];    x1 = m_state[1];    x2 = m_state[2];    x3 = m_state[3];
            x4 = m_state[4];    x5 = m_state[5];    x6 = m_state[6];    x7 = m_state[7];
            x8 = m_state[8];    x9 = m_state[9];    x10 = m_state[10];  x11 = m_state[11];
            x12 = m_state[12];  x13 = m_state[13];  x14 = m_state[14];  x15 = m_state[15];

            for (int i = static_cast<int>(rounds); i > 0; i -= 2)
            {
                CHACHA_QUARTER_ROUND(x0, x4,  x8, x12);
                CHACHA_QUARTER_ROUND(x1, x5,  x9, x13);
                CHACHA_QUARTER_ROUND(x2, x6, x10, x14);
                CHACHA_QUARTER_ROUND(x3, x7, x11, x15);

                CHACHA_QUARTER_ROUND(x0, x5, x10, x15);
                CHACHA_QUARTER_ROUND(x1, x6, x11, x12);
                CHACHA_QUARTER_ROUND(x2, x7,  x8, x13);
                CHACHA_QUARTER_ROUND(x3, x4,  x9, x14);
            }

            CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH(CHACHA_OUTPUT, BYTES_PER_ITERATION);

            if (++m_state[12] == 0)
                m_state[13]++;
        }

    // We may re-enter a SIMD keystream operation from here.
    } while (iterationCount--);
}

template class CRYPTOPP_DLL ChaCha_Policy<8>;
template class CRYPTOPP_DLL ChaCha_Policy<12>;
template class CRYPTOPP_DLL ChaCha_Policy<20>;

// ]] by BaiYang - 保留老版本中,将 rounds 作为模板参数(而不是属性)的实现方式

NAMESPACE_END
asbai commented 5 years ago

I've been uploaded these changes to my fork: https://github.com/asbai/cryptopp You can use the compare tool to review them if necessary.

noloader commented 5 years ago

Please run the self tests:

./cryptest.exe tv chacha

Here's what I get on Master:

$ ./cryptest.exe tv chacha
Using seed: 1550095472

Testing SymmetricCipher algorithm ChaCha.
........................................
Tests complete. Total tests = 40. Failed tests = 0.

The self tests can be found at TestVectors/chacha.txt. They exercise ChaCha8, ChaCha12 and ChaCha20.

noloader commented 5 years ago

I don't think you are using the code you think you are using. Your repo does not compile.

chachapoly.h:151:2: error: ‘ChaChaTLS’ does not name a type; did you mean ‘ChaCha20’?
  ChaChaTLS::Encryption m_cipher;
  ^~~~~~~~~
  ChaCha20
chachapoly.h: In member function ‘CryptoPP::SymmetricCipher& CryptoPP::ChaCha20Poly1305_Final<T_IsEncryption>::AccessSymmetricCipher()’:
chachapoly.h:141:11: error: ‘m_cipher’ was not declared in this scope
   {return m_cipher;}
           ^~~~~~~~
chachapoly.h:141:11: note: suggested alternative: ‘m_buffer’
   {return m_cipher;}
           ^~~~~~~~
           m_buffer
chachapoly.h: At global scope:
chachapoly.h:298:2: error: ‘XChaCha20’ does not name a type; did you mean ‘ChaCha20’?
  XChaCha20::Encryption m_cipher;
  ^~~~~~~~~
  ChaCha20
chachapoly.h: In member function ‘CryptoPP::SymmetricCipher& CryptoPP::XChaCha20Poly1305_Final<T_IsEncryption>::AccessSymmetricCipher()’:
chachapoly.h:288:11: error: ‘m_cipher’ was not declared in this scope
   {return m_cipher;}
           ^~~~~~~~
chachapoly.h:288:11: note: suggested alternative: ‘m_buffer’
   {return m_cipher;}
           ^~~~~~~~
           m_buffer
asbai commented 5 years ago

emmmm... It's strange, In my downloaded 8.0 (https://www.cryptopp.com/cryptopp800.zip), there are no files starting with "chachapoly", I've only four files whose names begin with chacha.: chacha.h chacha.cpp chacha_simd.cpp chacha_avx.cpp

And I'm very sure we are using the code, In addition, these files have been successfully compiled in VC, ICL, GCC and CLang.

In addition, we just did a full-text search on the downloaded code and did not find ChaChaTLS. What have I missed?

noloader commented 5 years ago

I cloned your repo.

$ git clone https://github.com/asbai/cryptopp.git cryptopp-asbai
$ cd cryptopp-asbai
$ make -j 5
asbai commented 5 years ago

But my repo is forked from your repo. . . And what I'm really used is the zip file from the official link above. I've double checked the zip file (https://www.cryptopp.com/cryptopp800.zip) and no chachapoly.h there.

noloader commented 5 years ago

I've double check the zip file (https://www.cryptopp.com/cryptopp800.zip) and no chachapoly.h there.

Right. ChaChaTLS and XChaCha were added after 8.0 was released.

And to answer your question, the only changes to the original Bernstein's ChaCha20 was from your recent bug report at Issue 800. But also see Refactor ChaCha and ChaChaTLS use a common core.

Did you run the self tests?

asbai commented 5 years ago

OK, thanks, I'll try to run the test ASAP~ :-)

noloader commented 5 years ago

For Crypto++ 8.0, the algorithm names in the ChaCha self tests are ChaCha8, ChaCha12 and ChaCha20. For Crypto++ 8.1 the algorithm name is ChaCha, but with a Rounds: 8, Rounds: 12, or Rounds: 20 parameter (like Salsa).

Something else you might try... Botan is a independent implementation. You can try to decrypt the files with Botan.

Finally, the current (master) ChaCha20 is documented at ChaCha20 on the Crypto++ wiki.

asbai commented 5 years ago

Very interesting, we have completely roll-backed the implementation of 8.0 to the pure C++ algorithm, and compared it with the 7.0 implementation, we found no logical difference. However, the data generated by 7.0 is still not open correctly (only the ChaCha algorithm has this problem, and other algorithms are good).

We temporarily regenerated the data with 8.0, which is no problem. As for the compatibility between 8.0 and 7.0, we will continue to study later.

Here is the pure C++ version of our chacha.cpp 8.0:

// chacha.cpp - written and placed in the public domain by Jeffrey Walton.
//              Based on Wei Dai's Salsa20, Botan's SSE2 implementation,
//              and Bernstein's reference ChaCha family implementation at
//              http://cr.yp.to/chacha.html.

#include "pch.h"
#include "config.h"
#include "chacha.h"
#include "argnames.h"
#include "misc.h"
#include "cpu.h"

NAMESPACE_BEGIN(CryptoPP)

#define CHACHA_QUARTER_ROUND(a,b,c,d) \
    a += b; d ^= a; d = rotlConstant<16,word32>(d); \
    c += d; b ^= c; b = rotlConstant<12,word32>(b); \
    a += b; d ^= a; d = rotlConstant<8,word32>(d); \
    c += d; b ^= c; b = rotlConstant<7,word32>(b);

#define CHACHA_OUTPUT(x){\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 0, x0 + m_state[0]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 1, x1 + m_state[1]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 2, x2 + m_state[2]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 3, x3 + m_state[3]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 4, x4 + m_state[4]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 5, x5 + m_state[5]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 6, x6 + m_state[6]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 7, x7 + m_state[7]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 8, x8 + m_state[8]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 9, x9 + m_state[9]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 10, x10 + m_state[10]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 11, x11 + m_state[11]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 12, x12 + m_state[12]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 13, x13 + m_state[13]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 14, x14 + m_state[14]);\
    CRYPTOPP_KEYSTREAM_OUTPUT_WORD(x, LITTLE_ENDIAN_ORDER, 15, x15 + m_state[15]);}

#if defined(CRYPTOPP_DEBUG) && !defined(CRYPTOPP_DOXYGEN_PROCESSING)
void ChaCha_TestInstantiations()
{
    ChaCha20::Encryption x;
}
#endif

template < unsigned int rounds >
std::string ChaCha_Policy<rounds>::AlgorithmName() const
{
    return std::string("ChaCha")+IntToString(rounds);
}

template < unsigned int rounds >
std::string ChaCha_Policy<rounds>::AlgorithmProvider() const
{
    return "C++";
}

template < unsigned int rounds >
void ChaCha_Policy<rounds>::CipherSetKey(const NameValuePairs &params, const byte *key, size_t length)
{
    CRYPTOPP_UNUSED(params);
    CRYPTOPP_ASSERT(length == 16 || length == 32);

    // "expand 16-byte k" or "expand 32-byte k"
    m_state[0] = 0x61707865;
    m_state[1] = (length == 16) ? 0x3120646e : 0x3320646e;
    m_state[2] = (length == 16) ? 0x79622d36 : 0x79622d32;
    m_state[3] = 0x6b206574;

    GetBlock<word32, LittleEndian> get1(key);
    get1(m_state[4])(m_state[5])(m_state[6])(m_state[7]);

    GetBlock<word32, LittleEndian> get2(key + ((length == 32) ? 16 : 0));
    get2(m_state[8])(m_state[9])(m_state[10])(m_state[11]);
}

template < unsigned int rounds >
void ChaCha_Policy<rounds>::CipherResynchronize(byte *keystreamBuffer, const byte *IV, size_t length)
{
    CRYPTOPP_UNUSED(keystreamBuffer), CRYPTOPP_UNUSED(length);
    CRYPTOPP_ASSERT(length==8);

    GetBlock<word32, LittleEndian> get(IV);
    m_state[12] = m_state[13] = 0;
    get(m_state[14])(m_state[15]);
}

template < unsigned int rounds >
void ChaCha_Policy<rounds>::SeekToIteration(lword iterationCount)
{
    m_state[12] = (word32)iterationCount;  // low word
    m_state[13] = (word32)SafeRightShift<32>(iterationCount);
}

template < unsigned int rounds >
unsigned int ChaCha_Policy<rounds>::GetAlignment() const
{
        return GetAlignmentOf<word32>();
}

template < unsigned int rounds >
unsigned int ChaCha_Policy<rounds>::GetOptimalBlockSize() const
{
        return BYTES_PER_ITERATION;
}

template < unsigned int rounds >
bool ChaCha_Policy<rounds>::MultiBlockSafe(unsigned int blocks) const
{
    return 0xffffffff - m_state[12] > blocks;
}

// OperateKeystream always produces a key stream. The key stream is written
// to output. Optionally a message may be supplied to xor with the key stream.
// The message is input, and output = output ^ input.
template < unsigned int rounds >
void ChaCha_Policy<rounds>::OperateKeystream(KeystreamOperation operation,
        byte *output, const byte *input, size_t iterationCount)
{
    do
    {
        if (iterationCount)
        {
            word32 x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15;

            x0 = m_state[0];    x1 = m_state[1];    x2 = m_state[2];    x3 = m_state[3];
            x4 = m_state[4];    x5 = m_state[5];    x6 = m_state[6];    x7 = m_state[7];
            x8 = m_state[8];    x9 = m_state[9];    x10 = m_state[10];  x11 = m_state[11];
            x12 = m_state[12];  x13 = m_state[13];  x14 = m_state[14];  x15 = m_state[15];

            for (int i = static_cast<int>(rounds); i > 0; i -= 2)
            {
                CHACHA_QUARTER_ROUND(x0, x4,  x8, x12);
                CHACHA_QUARTER_ROUND(x1, x5,  x9, x13);
                CHACHA_QUARTER_ROUND(x2, x6, x10, x14);
                CHACHA_QUARTER_ROUND(x3, x7, x11, x15);

                CHACHA_QUARTER_ROUND(x0, x5, x10, x15);
                CHACHA_QUARTER_ROUND(x1, x6, x11, x12);
                CHACHA_QUARTER_ROUND(x2, x7,  x8, x13);
                CHACHA_QUARTER_ROUND(x3, x4,  x9, x14);
            }

            CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH(CHACHA_OUTPUT, BYTES_PER_ITERATION);

            if (++m_state[12] == 0)
                m_state[13]++;
        }

    // We may re-enter a SIMD keystream operation from here.
    } while (iterationCount--);
}

template class CRYPTOPP_DLL ChaCha_Policy<8>;
template class CRYPTOPP_DLL ChaCha_Policy<12>;
template class CRYPTOPP_DLL ChaCha_Policy<20>;

NAMESPACE_END

This pure C++ version also correctly decrypts the ciphertext 8.0 encrypted by the SSE2 optimized version . But as I said before it is not working with the 7.0's ciphertext.

noloader commented 5 years ago

If needed, you can build out test cases with Noloader | ChaCha20. It is one of my GiHubs.

The cryptopp-test GitHub is where I place reference implementations I use to generate test vectors. In the case of ChaCha20, that is Bernstein's reference implementation and it is part of ECRYPT. Crypto++ is validated against the test vectors generated by Bernstein's program.

And in the case of Bernstein's ChaCha20, I added three functions: main, XXX_ctr_setup and XXX_rand_bytes.

asbai commented 5 years ago

@noloader OK, a lot of thanks. I will give it a try later~

noloader commented 5 years ago

@asbai,

Could you please retest ChaCha now that you have access to another dev-board and updated compiler.

noloader commented 5 years ago

Closing out. We can revisit it if we have to.