weidai11 / cryptopp

free C++ class library of cryptographic schemes
https://cryptopp.com
Other
4.89k stars 1.51k forks source link

Inflator throws end of compressed block on complete compressed data #112

Closed PowerGamer1 closed 8 years ago

PowerGamer1 commented 8 years ago

Most of my compressed data samples Inflator class decompresses just fine. But on some compressed data it throws "Inflator: unexpected end of compressed block" yet the data is valid and complete. Adding one extra byte of ANY VALUE to the end of such data makes Inflator to decompress it without errors.

See the attached sample code and data file that reproduces the problem: default.gz

P.S. GitHub stuck on uploading sample C++ code in a separate file so I am pasting it here:

#include <cstdio>
#include "cryptopp/gzip.h"

using namespace CryptoPP;

int main()
{
    byte in[20000], out[70000];
    const size_t file_size = 16782;
    FILE* f = fopen("!.gz", "rb");
    fread(in, 1, file_size, f);
    fclose(f);

    // Works fine:
    //ArraySource(in, file_size, true, new Gunzip(new ArraySink(out, sizeof(out))));

    // GZIP data in test file has 10 bytes of GZIP header and 8 bytes of footer (CRC32 and ISIZE, see http://www.rfc-editor.org/rfc/rfc1952.txt).
    // So we offset "in" by 10 bytes and the size of actual compressed data (without GZIP header and footer) would be (file_size - 18) bytes.
    // Yet the following line of code throws "Inflator: unexpected end of compressed block".
    ArraySource(in + 10, file_size - 18, true, new Inflator(new ArraySink(out, sizeof(out))));

    // If at least one extra byte of data is passed to the Inflator (the value of that byte of data makes no difference) no exception is thrown.
    //in[file_size - 8] = 0xFF; // value of extra byte makes no difference
    //ArraySource(in + 10, file_size - 18 + 1, true, new Inflator(new ArraySink(out, sizeof(out))));
}
noloader commented 8 years ago

I've always known there's been some minor incompatibilities among inflator programs, but I've never really investigated it. Also see A lone zero block at ###. When using GNU tar, use the -i or --ignore-zeros option.

OpenSSL experienced a similar issue when using a program called _tardy_; see Unexpected error when I untar OpenSSL 1.0.2d.

PowerGamer1 commented 8 years ago

@noloader I don't see how the GNU tar and the info in your links relate to what I posted. Tar has nothing at all to do with the sample compressed data in GZIP format I provided. The compressed data is not located at the end of file, it is followed by GZIP footer which is valid and conforms to GZIP spec (http://www.rfc-editor.org/rfc/rfc1952.txt). So its not about some zero bytes at the end of file.

noloader commented 8 years ago

@PowerGamer1 - can you provide (1) the source file that's causing trouble, (2) the name of the program that creates the archive, (3) the command used to deflate the source file, and (4) the deflated file?

Sorry to have to ask. I don't really follow what you are doing with the index manipulations in the ArraySink. But the comment "If at least one extra byte of data is passed to the Inflator (the value of that byte of data makes no difference) no exception is thrown" seems to indicate its a similar problem experienced by GNU tar users.


Here's where the message is coming from:

$ grep "unexpected end of compressed block" *.h *.cpp
zinflate.h: class UnexpectedEndErr : public Err {public: UnexpectedEndErr() : Err(INVALID_DATA_FORMAT, "Inflator: unexpected end of compressed block") {}};

Unfortunately, it can be thrown in a number of places:

$ grep -n UnexpectedEndErr *.h *.cpp
zinflate.h:97:  class UnexpectedEndErr : public Err {public: UnexpectedEndErr() ...
zinflate.cpp:310:       throw UnexpectedEndErr();
zinflate.cpp:376:   throw UnexpectedEndErr();
zinflate.cpp:385:       throw UnexpectedEndErr();
zinflate.cpp:398:       throw UnexpectedEndErr();
zinflate.cpp:419:           throw UnexpectedEndErr();
zinflate.cpp:429:               throw UnexpectedEndErr();
zinflate.cpp:437:               throw UnexpectedEndErr();
zinflate.cpp:443:               throw UnexpectedEndErr();
PowerGamer1 commented 8 years ago

@PowerGamer1 - can you provide (1) the source file that's causing trouble, (2) the name of the program that creates the archive, (3) the command used to deflate the source file, and (4) the deflated file?

(1) - you can get the original uncompressed data from the compressed file I attached, using for ex. gzip from http://gzip.org, or by using Crypto++ by uncommenting one of the lines in my sample program that are marked as working without error.

(2), (3) - can't really say what program created archive, the data comes from a server not under my control, but it should not matter anyway - the data is perfectly valid GZIP format data (you can check GZIP header and footer in HEX editor, CryptoPP::Gunzip class accepts them too, and the actual compressed data is valid too, since with workaround with extra byte for Inflator class the data is decompressed and passes GZIP CRC check.).

(4) the sample compressed (deflated) file is attached in my first post (filename is !.gz).

I don't really follow what you are doing with the index manipulations in the ArraySink .

Check the GZIP header and footer (http://www.rfc-editor.org/rfc/rfc1952.txt) of the sample data file I attached (!.gz) and you will understand that I skip them and pass only actual compressed data to Inflator class.

P.S. Does the original author of the library still take part in its maintenance?

noloader commented 8 years ago

P.S. Is the original author of the library still takes part in its maintenance?

The author of the library is Wei Dai. Unfortunately, no, he does not have time for maintenance and growth of the library. He effectively turned it over to the care of the community. Wei still controls the library, he just does not have time to tend to the daily operations.

I'm one of the point-men in the community who has check-in privileges. There are three or four other folks with the same rights. But like with Wei Dai, they don't really have the spare cycles to donate to the project. I have the spare cycles (at the moment), so you see a lot of me.

As soon as we push Crypto++ 5.7, I'm going back to work. I took an unpaid sabbatical to ensure 5.6.3 and 5.7 gets out. I expect 5.7 to be out in the next month or two, so development should slow after that.

PowerGamer1 commented 8 years ago

The zlib library decompresses the actual compressed data (without GZIP header and footer) from sample file without any problems:

#include <cstdio>
#include <cassert>

#define ZLIB_CONST
#include "zlib/zlib.h"

int main()
{
    Bytef in[20000], out[70000];
    const size_t file_size = 16782;
    FILE* f = fopen("!.gz", "rb");
    fread(in, 1, file_size, f);
    fclose(f);

    z_stream stream;
    stream.next_in = in + 10; // skip 10 bytes of GZIP header
    stream.avail_in = file_size - 18; // skip 10 bytes of GZIP header and 8 bytes of GZIP footer
    stream.next_out = out;
    stream.avail_out = sizeof(out);
    stream.zalloc = nullptr;
    stream.zfree = nullptr;

    auto result = inflateInit2(&stream, -MAX_WBITS); // negative windowBits mean no ZLIB or GZIP headers
    assert(result == Z_OK);

    result = inflate(&stream, Z_FINISH);
    assert(result == Z_STREAM_END);

    inflateEnd(&stream);
}

So it is definitely a bug in CryptoPP::Inflator class.

noloader commented 8 years ago

Thanks @PowerGamer1 .

I know Crypto++ attempts to provide a RFC 1951 conforming implementation. I'm trying to locate test vectors for things now. Would you happen to know where they can be found?

PowerGamer1 commented 8 years ago

test vectors for things

No idea what you are talking about.

dirvine commented 8 years ago

Think of test vectors as agreed outputs to fixed inputs to test algorithms.

anonimal commented 8 years ago

We had a similar problem (as did java I2P). The issue was resolved by appending a null byte to compressed data. For us, the problem was somehow introduced in Crypto++ 5.6.3 though I can't spot where exactly. Kovri patches here and eventually here.

noloader commented 8 years ago

Thanks @anonimal. Do you know if the Gzip authors publish any independent test vectors? I have not been able to locate them, and I am hesitant to make a breaking change on others for a special case.

Would it be possible for you to provide a separate GzipStream class (following z_stream) that handles this issue for you? In this case, it seems like the NULL byte should be added automatically when MessageEnd() is called on the stream?

For completeness, I'm all for bug fixing and providing what is needed. The project's issues are: (1) we don't know exactly where we need to be, (2) the authors failed to publish test vectors, and (3) we don't want to break things for existing users (or without good/extraordinary cause).

anonimal commented 8 years ago

Hi @noloader. Sorry, but no; I do not know if they have published any independent test vectors. On a related note, Adler himself @madler has a nice response to a similar question (though I'm sure you've probably seen this).

@PowerGamer1 I believe the 2nd paragraph (GzipStream/z_stream) is meant for you (I was confused for a moment :smile:)

Regarding points 1-3: I completely understand the current project's issues. Now, regarding this particular issue: I've reviewed the diff between tags 5.6.2-5.6.3 for zlib/zinflate/cryptlib and could not find anything of consequence. Then again, the vast amounts of windows carriage return on the 5.6.3 commits made it difficult to compare actual code change with precision so I may have missed something - or the problem lies elsewhere.

noloader commented 8 years ago

Then again, the vast amounts of windows carriage return on the 5.6.3 commits

@anonimal - Oh that's right... I forgot about that. Let me see if I can do some of the legwork. And I was not aware 5.6.2 did not suffer it. That changes the dynamics of this issue.

If we (I) introduced the new behavior, then its something we definitely want to fix. That code had an uninitialized read under Valgrind, and it took me some time to track down. Eventually we pin-pointed it with Coverity.

Since I can't say if I broke it, that points to a gap in our engineering process. We should always be able to detect these things before and after. Guessing after the fact is no good for anyone, and it wastes people's time (q.v.).

I think I need to fetch Adler's reference implementation, and build out some test cases.

noloader commented 8 years ago

@anonimal - Below is a test script that should help with the changes. Things are much more manaeable. Nothing is jumping out at me.

Does anything jump out at you?


mkdir cryptopp-gzip
cd cryptopp-gzip
wget https://www.cryptopp.com/cryptopp562.zip
wget https://www.cryptopp.com/cryptopp563.zip
unzip -a cryptopp562.zip -d '5.6.2'
unzip -a cryptopp563.zip -d '5.6.3'

diff --ignore-space-change --ignore-trailing-space 5.6.2/zlib.h 5.6.3/zlib.h
diff --ignore-space-change --ignore-trailing-space 5.6.2/zlib.cpp 5.6.3/zlib.cpp
diff --ignore-space-change --ignore-trailing-space 5.6.2/gzip.h 5.6.3/gzip.h
diff --ignore-space-change --ignore-trailing-space 5.6.2/gzip.cpp 5.6.3/gzip.cpp
diff --ignore-space-change --ignore-trailing-space 5.6.2/zinflate.h 5.6.3/zinflate.h
diff --ignore-space-change --ignore-trailing-space 5.6.2/zinflate.cpp 5.6.3/zinflate.cpp

Here are the results I am seeing.

cryptopp-gzip$ cat gzip.diff

    **** zlib.h ****    
3a4
> #include "cryptlib.h"
39,41c40,43
<   /*! \param repeat decompress multiple compressed streams in series
<       \param autoSignalPropagation 0 to turn off MessageEnd signal
<   */
---
>   //! \brief Construct a ZlibDecompressor
>   //! \param attachment a \ BufferedTransformation to attach to this object
>   //! \param repeat decompress multiple compressed streams in series
>   //! \param autoSignalPropagation 0 to turn off MessageEnd signal

    **** zlib.cpp ****    
11a12
> #include "secblock.h"
16c17
< static const byte FDICT_FLAG = 1 << 5;
---
> static const byte FDICT_FLAG = (1 << 5);
23,25c24,28
<   byte cmf = DEFLATE_METHOD | ((GetLog2WindowSize()-8) << 4);
<   byte flags = GetCompressionLevel() << 6;
<   AttachedTransformation()->PutWord16(RoundUpToMultipleOf(cmf*256+flags, 31));
---
>   assert(((GetLog2WindowSize()-8) << 4) <= 255);
>   byte cmf = byte(DEFLATE_METHOD | ((GetLog2WindowSize()-8) << 4));
>   assert((GetCompressionLevel() << 6) <= 255);
>   byte flags = byte(GetCompressionLevel() << 6);
>   AttachedTransformation()->PutWord16(RoundUpToMultipleOf(word16(cmf*256+flags), word16(31)));
49c52
<   : Inflator(attachment, repeat, propagation)
---
>   : Inflator(attachment, repeat, propagation), m_log2WindowSize(0)

    **** gzip.h ****    
3a4
> #include "cryptlib.h"
15c16
<       : Deflator(attachment, deflateLevel, log2WindowSize, detectUncompressible) {}
---
>       : Deflator(attachment, deflateLevel, log2WindowSize, detectUncompressible), m_totalLen(0) {}
17c18
<       : Deflator(parameters, attachment) {}
---
>       : Deflator(parameters, attachment), m_totalLen(0) {}
31c32,33
< /// GZIP Decompression (RFC 1952)
---
> //! \class Gunzip
> //! \brief GZIP Decompression (RFC 1952)
41,43c43,46
<   /*! \param repeat decompress multiple compressed streams in series
<       \param autoSignalPropagation 0 to turn off MessageEnd signal
<   */
---
>   //! \brief Construct a Gunzip
>   //! \param attachment a \ BufferedTransformation to attach to this object
>   //! \param repeat decompress multiple compressed streams in series
>   //! \param autoSignalPropagation 0 to turn off MessageEnd signal
47,48c50,57
<   enum {MAGIC1=0x1f, MAGIC2=0x8b,   // flags for the header
<       DEFLATED=8};
---
>   enum {
>       //! \brief First header magic value
>       MAGIC1=0x1f,
>       //! \brief Second header magic value
>       MAGIC2=0x8b,
>       //! \brief Deflated flag
>       DEFLATED=8
>   };

    **** gzip.cpp ****    
18c18
<   byte extra = (GetDeflateLevel() == 1) ? FAST : ((GetDeflateLevel() == 9) ? SLOW : 0);
---
>   byte extra = byte((GetDeflateLevel() == 1) ? FAST : ((GetDeflateLevel() == 9) ? SLOW : 0));
40c40
<   : Inflator(attachment, repeat, propagation)
---
>   : Inflator(attachment, repeat, propagation), m_length(0)

    **** zinflate.h ****    
3a4,5
> #include "cryptlib.h"
> #include "secblock.h"
5c7
< #include <vector>
---
> #include "stdcpp.h"
15d16
< //    unsigned long BitsLeft() const {return m_store.MaxRetrievable() * 8 + m_bitsBuffered;}
41,42c42,45
<   HuffmanDecoder() {}
<   HuffmanDecoder(const unsigned int *codeBitLengths, unsigned int nCodes){Initialize(codeBitLengths, nCodes);}
---
>   HuffmanDecoder() : m_maxCodeBits(0), m_cacheBits(0), m_cacheMask(0), m_normalizedCacheMask(0) {}
>   HuffmanDecoder(const unsigned int *codeBitLengths, unsigned int nCodes)
>       : m_maxCodeBits(0), m_cacheBits(0), m_cacheMask(0), m_normalizedCacheMask(0)
>           {Initialize(codeBitLengths, nCodes);}
97,99c100,103
<   /*! \param repeat decompress multiple compressed streams in series
<       \param autoSignalPropagation 0 to turn off MessageEnd signal
<   */
---
>   //! \brief RFC 1951 Decompressor
>   //! \param attachment the filter's attached transformation
>   //! \param repeat decompress multiple compressed streams in series
>   //! \param autoSignalPropagation 0 to turn off MessageEnd signal

    **** zinflate.cpp ****    
8a9
> 
9a11,12
> #include "secblock.h"
> #include "smartptr.h"
39c42
<   assert(result);
---
>   CRYPTOPP_UNUSED(result); assert(result);
113c116,118
<   if (code > (1 << m_maxCodeBits) - blCount[m_maxCodeBits])
---
>   // MAX_CODE_BITS is 32, m_maxCodeBits may be smaller.
>   const unsigned long long shiftedMaxCode = (1ULL << m_maxCodeBits);
>   if (code > shiftedMaxCode - blCount[m_maxCodeBits])
115c120
<   else if (m_maxCodeBits != 1 && code < (1 << m_maxCodeBits) - blCount[m_maxCodeBits])
---
>   else if (m_maxCodeBits != 1 && code < shiftedMaxCode - blCount[m_maxCodeBits])
141,142c146,149
<   if (m_cache.size() != size_t(1) << m_cacheBits)
<       m_cache.resize(1 << m_cacheBits);
---
>   const unsigned long long shiftedCache = (1ULL << m_cacheBits);
>   assert(shiftedCache <= SIZE_MAX);
>   if (m_cache.size() != shiftedCache)
>       m_cache.resize((size_t)shiftedCache);
180c187
<   code_t normalizedCode;
---
>   code_t normalizedCode = 0;
204c211,213
<   reader.FillBuffer(m_maxCodeBits);
---
>   bool result = reader.FillBuffer(m_maxCodeBits);
>   if(!result) return false;
> 
216c225,227
<   , m_state(PRE_STREAM), m_repeat(repeat), m_reader(m_inQueue)
---
>   , m_state(PRE_STREAM), m_repeat(repeat), m_eof(0), m_wrappedAround(0)
>   , m_blockType(0xff), m_storedLen(0xffff), m_nextDecode(), m_literal(0)
>   , m_distance(0), m_reader(m_inQueue), m_current(0), m_lastFlush(0)
405c416
<               unsigned int k, count, repeater;
---
>               unsigned int k = 0, count = 0, repeater = 0;
475a487,488
>           assert(size <= 0xffff);
>           
478c491
<           m_storedLen -= (word16)size;
---
>           m_storedLen = m_storedLen - (word16)size;
549a563,565
>           break;
>       default:
>           assert(0);
593c609
<       std::auto_ptr<HuffmanDecoder> pDecoder(new HuffmanDecoder);
---
>       member_ptr<HuffmanDecoder> pDecoder(new HuffmanDecoder);
605c621
<       std::auto_ptr<HuffmanDecoder> pDecoder(new HuffmanDecoder);
---
>       member_ptr<HuffmanDecoder> pDecoder(new HuffmanDecoder);
cryptopp-gzip$ 
noloader commented 8 years ago

Good news for PowerGamer. The issue appears to be intermittent, but I can duplicate the issue:

$ ./test.exe 
Data: 9480BB7600DDF8A11830D877FB4596A6
Compressed Data: 9BD2B0BB8CE1EE8F85120637CA7FBB4E5B0600
Decompressed stream
$ ./test.exe 
Data: D7229EACCE1B1C0D2F5F8CACA9397C29
Compressed Data: BBAE346FCD3969195EFDF89E352B2D6B3401
Inflator: unexpected end of compressed block
$ ./test.exe 
Data: 58CA88F99248B20ABACC15B3B78BAB71
Compressed Data: 8B38D5F17392C726AE5D6744376FEF5E5D0800
Decompressed stream
$ ./test.exe 
Data: 4C328358FB92C77154D688CCD91FC4F1
Compressed Data: F3316A8EF83DE97861C8B58E3337E58F7C0400
Decompressed stream
$ ./test.exe 
Data: 97A34EF4C3CCD2D4BD037322B1D5E77B
Compressed Data: 9BBED8EFCBE13397AEEC652E56DA78F5793500
Decompressed stream
$ ./test.exe 
Data: B7CA080BB7E28456EE4A87FBB539C9A0
Compressed Data: DB7E8A837BFBA396B0775EEDBFB75A9E5C0000
Decompressed stream
$ ./test.exe 
Data: 5AFED6E676ECC0E69754373217F766E7
Compressed Data: 8BFA77ED59D99B03CFA687981B897F4F7B0E00
Decompressed stream
$ ./test.exe 
Data: 417A6C9DF1246DC1FE4690BFD34E51F8
Compressed Data: 73ACCA99FB5125F7E03FB709FB2FFB05FE0000
Decompressed stream
$ ./test.exe 
Data: 9BFC23E9A61EAC55104740FBE3329123
Compressed Data: 9BFD47F9E532B935A102EE0EBF1F1B4D540600
Decompressed stream
$ ./test.exe 
Data: 8B8B55D9BA5D7604B133CDD632FDB71D
Compressed Data: EBEE0EBDB92BB68C65A3F1D96B467FB7CB0200
Decompressed stream
$ ./test.exe 
Data: 2C1D9C0CC471DB00EDB41EF9EC308791
Compressed Data: D3919DC373A4F036C3DB2D723FDF18B44F0400
Decompressed stream
$ ./test.exe 
Data: 0F16CA19777F53E0C2DE5363D7646B0F
Compressed Data: E3173B25595E1FFCE0D0BDE0E4EB29D9FC00
Inflator: unexpected end of compressed block
$ ./test.exe 
Data: C65E8BC86F0CF63376F8E0265DFF5C5B
Compressed Data: 3B16D77D229FE79B71D98F076AB1FF63A201
Inflator: unexpected end of compressed block

// g++ $CXXFLAGS -I . test.cxx ./libcryptopp.a -o test.exe
int main(int argc, char* argv[])
{
    try
    {
        AutoSeededRandomPool prng;
        HexEncoder encoder(new FileSink(cout));
        string s1, s2, s3;

        s1.resize(16);
        OS_GenerateRandomBlock(false, (byte*)&s1[0], s1.size());

        cout << "Data: ";
        StringSource ss1(s1, true, new Redirector(encoder));
        cout << endl;

        StringSource ss2(s1, true, new Deflator(new StringSink(s2)));

        cout << "Compressed Data: ";
        StringSource ss3(s2, true, new Redirector(encoder));
        cout << endl;

        StringSource ss4(s2, true, new Inflator(new StringSink(s3)));

        if(s1 != s3)
        {
            cerr << "Failed to decompress stream" << endl;
            return 1;
        }

        cout << "Decompressed stream" << endl;
    } catch(const Exception& ex) {
        cerr << ex.what() << endl;
    }

    return 0;
}
noloader commented 8 years ago

It appears this failure is due to an unexpected return value in HuffmanDecoder::Decode and the call to reader.FillBuffer. Current code performs an early out at line 212. I added the return value check after an assert continued to fire due to it.

bool HuffmanDecoder::Decode(LowFirstBitReader &reader, value_t &value) const
{
    bool result = reader.FillBuffer(m_maxCodeBits);    // Line 211
    if(!result) return false;                          // Line 212

    unsigned int codeBits = Decode(reader.PeekBuffer(), value);
    if (codeBits > reader.BitsBuffered())
        return false;
    reader.SkipBits(codeBits);
    return true;
}

Here's the good code (5.6.2) and bad code (5.6.3) side-by-side from the diff on zinflate.cpp above:

204c211,213
<   reader.FillBuffer(m_maxCodeBits);
---
>   bool result = reader.FillBuffer(m_maxCodeBits);
>   if(!result) return false;
> 
noloader commented 8 years ago

Fixed at Commit 60a68714dc3587e9.

anonimal commented 8 years ago

Hi @noloader, I'm just now getting these messages. Great catch!