richgel999 / lzham_codec

Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++
Other
693 stars 71 forks source link

Large file error #19

Open BorisMansencal opened 7 years ago

BorisMansencal commented 7 years ago

Hi,

First of all, thank you for this excellent open source codec.

I am using LZHAM to compress a rather large buffer (6140582460 bytes). I compress by chunks of maximum UINT32_MAX (=4294967295) bytes , but it fails (on one particular buffer).

Is UINT32_MAX allowed for lzham_compress_memory() or is it too large ?

In release I have a core dumped. When compiled in Debug mode, I have the following backtrace:

#0  0x00007fffda8329c8 in raise () at /lib64/libc.so.6
#1  0x00007fffda83465a in abort () at /lib64/libc.so.6
#2  0x00007fffda82b187 in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007fffda82b232 in  () at /lib64/libc.so.6
#4  0x00007fffd9f4c238 in lzham_debug_break() () at /net/Tools/lzham_codec/lzhamdecomp/lzham_platform.cpp:67
#5  0x00007fffd9f2ff7a in lzham_assert(char const*, char const*, unsigned int) (pExp=0x7fffd9f538c8 "new_capacity && (new_capacity > m_capacity)", pFile=0x7fffd9f53840 "/net/Tools/lzham_codec/lzhamdecomp/lzham_vector.cpp", line=28) at /net/Tools/lzham_codec/lzhamdecomp/lzham_assert.cpp:24
#6  0x00007fffd9f525d1 in lzham::elemental_vector::increase_capacity(unsigned int, bool, unsigned int, void (*)(void*, void*, unsigned int), bool) (this=0x7ffe3e6155f8, min_new_capacity=2147603813, grow_hint=true, element_size=1, pMover=0x0, nofail=true)
    at /net/Tools/lzham_codec/lzhamdecomp/lzham_vector.cpp:28
#7  0x00007fffda16909c in lzham::vector<unsigned char>::increase_capacity(unsigned int, bool, bool) (this=0x7ffe3e6155f8, min_new_capacity=2147603813, grow_hint=true, nofail=true) at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:573
#8  0x00007fffda168877 in lzham::vector<unsigned char>::try_resize(unsigned int, bool) (this=0x7ffe3e6155f8, new_size=2147603813, grow_hint=true)
    at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:183
#9  0x00007fffda170238 in lzham::vector<unsigned char>::insert(unsigned int, unsigned char const*, unsigned int) (this=0x7ffe3e6155f8, index=2147331299, p=0xa6e3e0 "A9\222<υ\266\241\354\301\256\r\346\022\277\271hQ\374F\"V\226\035\356['3\255\346\t\357\217\n\324\304\344n\034\325G\320\300\031\231\223\375\061$\213\314\321;}IY\271\335x\201\r\202\071bf\315~\210D/\aF^\347\256k\315\037fj\243'AߪJ\032\372.\307\317\061y\227ԋ`\352\307U\271\201\067n=ޚ\377\231\227?*Gf\347\270\363\275\271\372ؾ\205\061\002t\246\321\061\026\021+\347\b\221d\263F\275\373\354\376\331mޣ\233t\342\070m\025\354\220-|6\t/\177\037ߖΪN^I\030<LQ\307\066\aD\377\300ޝ\027\312\301\262\017\027\325\365A\245\004\222I\244z\231\377pyi"..., n=272514)
    at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:251
#10 0x00007fffda16f8e4 in lzham::vector<unsigned char>::append(lzham::vector<unsigned char> const&) (this=0x7ffe3e6155f8, other=...)
    at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:297
#11 0x00007fffda16e93d in lzham::lzcompressor::compress_block_internal(void const*, unsigned int) (this=0x7ffe3e612010, pBuf=0x7fff5226f010, buf_len=524288) at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp_internal.cpp:1951
#12 0x00007fffda16d31c in lzham::lzcompressor::compress_block(void const*, unsigned int) (this=0x7ffe3e612010, pBuf=0x7fff5226f010, buf_len=524288)
    at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp_internal.cpp:1459
#13 0x00007fffda16a85a in lzham::lzcompressor::put_bytes(void const*, unsigned int) (this=0x7ffe3e612010, pBuf=0x7ffe5a46f010, buf_len=4294967295)
    at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp_internal.cpp:515
#14 0x00007fffda167e3f in lzham::lzham_lib_compress_memory(lzham_compress_params const*, unsigned char*, unsigned long*, unsigned char const*, unsigned long, unsigned int*) (pParams=0x7fffffffdbb0, pDst_buf=0x7ffce0992010 "", pDst_len=0x7fffffffdb98, pSrc_buf=0x7ffe5a46f010 "?", src_len=4294967295, pAdler32=0x0) at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp.cpp:353
#15 0x00007ffff6b2ee06 in lzham_compress_memory(lzham_compress_params const*, lzham_uint8*, size_t*, lzham_uint8 const*, size_t, lzham_uint32*) (pParams=0x7fffffffdbb0, pDst_buf=0x7ffce0992010 "", pDst_len=0x7fffffffdb98, pSrc_buf=0x7ffe5a46f010 "?", src_len=4294967295, pAdler32=0x0)
    at /net/Tools/lzham_codec/lzhamdll/lzham_api.cpp:81

It seems I have a vector of capacity 2147487728, that size is to be increased to at least 2147603813, but next_pow2(2147603813)=4294967296 > UINT32_MAX=4294967295.

Is it a bug or should I use smaller chunks ?

This is on Linux x86_64 (Fedora 22, gcc 5.3.1), on an Intel Xeon CPU E5-2695 v3.

Thank you for you help.

Boris.

richgel999 commented 7 years ago

Hi Boris - I'll check into this ASAP. I'm guessing it's a bug in lzham_compress_memory(), because the main (low-level) API has been tested on extremely large files.

On Fri, Nov 4, 2016 at 10:27 AM, Boris Mansencal notifications@github.com wrote:

Hi,

First of all, thank you for this excellent open source codec.

I am using LZHAM to compress a rather large buffer (6140582460 bytes). I compress by chunks of maximum UINT32_MAX (=4294967295) bytes , but it fails (on one particular buffer).

Is UINT32_MAX allowed for lzham_compress_memory() or is it too large ?

In release I have a core dumped. When compiled in Debug mode, I have the following backtrace:

0 0x00007fffda8329c8 in raise () at /lib64/libc.so.6

1 0x00007fffda83465a in abort () at /lib64/libc.so.6

2 0x00007fffda82b187 in __assert_fail_base () at /lib64/libc.so.6

3 0x00007fffda82b232 in () at /lib64/libc.so.6

4 0x00007fffd9f4c238 in lzham_debug_break() () at /net/Tools/lzham_codec/lzhamdecomp/lzham_platform.cpp:67

5 0x00007fffd9f2ff7a in lzhamassert(char const, char const_, unsigned int) (pExp=0x7fffd9f538c8 "new_capacity && (new_capacity > m_capacity)", pFile=0x7fffd9f53840 "/net/Tools/lzham_codec/lzhamdecomp/lzham_vector.cpp", line=28) at /net/Tools/lzham_codec/lzhamdecomp/lzham_assert.cpp:24

6 0x00007fffd9f525d1 in lzham::elemental_vector::increasecapacity(unsigned int, bool, unsigned int, void ()(void, void, unsigned int), bool) (this=0x7ffe3e6155f8, min_new_capacity=2147603813, grow_hint=true, element_size=1, pMover=0x0, nofail=true)

at /net/Tools/lzham_codec/lzhamdecomp/lzham_vector.cpp:28

7 0x00007fffda16909c in lzham::vector::increase_capacity(unsigned int, bool, bool) (this=0x7ffe3e6155f8, min_new_capacity=2147603813, grow_hint=true, nofail=true) at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:573

8 0x00007fffda168877 in lzham::vector::try_resize(unsigned int, bool) (this=0x7ffe3e6155f8, new_size=2147603813, grow_hint=true)

at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:183

9 0x00007fffda170238 in lzham::vector::insert(unsigned int, unsigned char const_, unsigned int) (this=0x7ffe3e6155f8, index=2147331299, p=0xa6e3e0 "A9\222<υ\266\241\354\301\256\r\346\022\277\271hQ\374F\"V\226\035\356['3\255\346\t\357\217\n\324\304\344n\034\325G\320\300\031\231\223\375\061$\213\314\321;}IY\271\335x\201\r\202\071bf\315~\210D/\aF^\347\256k\315\037fj\243'AߪJ\032\372.\307\317\061y\227ԋ`\352\307U\271\201\067n=ޚ\377\231\227?_Gf\347\270\363\275\271\372ؾ\205\061\002t\246\321\061\026\021+\347\b\221d\263F\275\373\354\376\331mޣ\233t\342\070m\025\354\220-|6\t/\177\037ߖΪN^I\030<LQ\307\066\aD\377\300ޝ\027\312\301\262\017\027\325\365A\245\004\222I\244z\231\377pyi"..., n=272514)

at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:251

10 0x00007fffda16f8e4 in lzham::vector::append(lzham::vector const&) (this=0x7ffe3e6155f8, other=...)

at /net/Tools/lzham_codec/lzhamcomp/../lzhamdecomp/lzham_vector.h:297

11 0x00007fffda16e93d in lzham::lzcompressor::compress_blockinternal(void const, unsigned int) (this=0x7ffe3e612010, pBuf=0x7fff5226f010, buf_len=524288) at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp_internal.cpp:1951

12 0x00007fffda16d31c in lzham::lzcompressor::compressblock(void const, unsigned int) (this=0x7ffe3e612010, pBuf=0x7fff5226f010, buf_len=524288)

at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp_internal.cpp:1459

13 0x00007fffda16a85a in lzham::lzcompressor::putbytes(void const, unsigned int) (this=0x7ffe3e612010, pBuf=0x7ffe5a46f010, buf_len=4294967295)

at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp_internal.cpp:515

14 0x00007fffda167e3f in lzham::lzham_lib_compress_memory(lzham_compressparams const, unsigned char, unsigned long, unsigned char const, unsigned long, unsigned int) (pParams=0x7fffffffdbb0, pDst_buf=0x7ffce0992010 "", pDst_len=0x7fffffffdb98, pSrc_buf=0x7ffe5a46f010 "?", src_len=4294967295, pAdler32=0x0) at /net/Tools/lzham_codec/lzhamcomp/lzham_lzcomp.cpp:353

15 0x00007ffff6b2ee06 in lzham_compress_memory(lzham_compressparams const, lzhamuint8, sizet, lzhamuint8 const, size_t, lzhamuint32) (pParams=0x7fffffffdbb0, pDst_buf=0x7ffce0992010 "", pDst_len=0x7fffffffdb98, pSrc_buf=0x7ffe5a46f010 "?", src_len=4294967295, pAdler32=0x0)

at /net/Tools/lzham_codec/lzhamdll/lzham_api.cpp:81

It seems I have a vector of capacity 2147487728, that size is to be increased to at least 2147603813, but next_pow2(2147603813)=4294967296 > UINT32_MAX=4294967295.

Is it a bug or should I use smaller chunks ?

This is on Linux x86_64 (Fedora 22, gcc 5.3.1), on an Intel Xeon CPU E5-2695 v3.

Thank you for you help.

Boris.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/richgel999/lzham_codec/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/AGnRGc1Vd1udgJdsRoNK1rYAsFSCOKdwks5q62r8gaJpZM4Kpy2t .

BorisMansencal commented 7 years ago

Hi, There may be a bug in lzham_compress_memory(), I don't know.

However, I am wondering if elemental_vector should be changed to better handle large sizes. Currently, you have the following code:

bool elemental_vector::increase_capacity(uint min_new_capacity, bool grow_hint, uint element_size, object_mover pMover, bool nofail)
{
  [...]
      // new_capacity must be 64-bit when compiling on x64.
      size_t new_capacity = (size_t)min_new_capacity;
      if ((grow_hint) && (!math::is_power_of_2(static_cast<uint64>(new_capacity))))
         new_capacity = static_cast<uint>(math::next_pow2(static_cast<uint64>(new_capacity)));
      LZHAM_ASSERT(new_capacity && (new_capacity > m_capacity));
 [...]
}

I think there should be a check before the assert (when grow_hint==true):

if (new_capacity < min_new_capacity)
    return false;

The code could be changed to something like this:

bool elemental_vector::increase_capacity(uint min_new_capacity, bool grow_hint, uint element_size, object_mover pMover, bool nofail)
{
  [...]
      // new_capacity must be 64-bit when compiling on x64.
      size_t new_capacity = (size_t)min_new_capacity;
      if ((grow_hint) && (!math::is_power_of_2(static_cast<uint64>(new_capacity)))) {
         new_capacity = static_cast<uint>(math::next_pow2(static_cast<uint64>(new_capacity)));
         if (new_capacity < min_new_capacity) {
            new_capacity = UINT32_MAX;
            if (new_capacity < min_new_capacity)
           return false;
         }
      }
      LZHAM_ASSERT(new_capacity && (new_capacity > m_capacity));
 [...]
}

Here, if we have the grow_hint set to true, and next power of two is greater than UINT32_MAX, we set the new_capacity to UINT32_MAX (so not a power of two...). With this code, the previous case (m_capacity=2147487728, min_new_capacity=2147603813, grow_hint=true) would not fail.

I have not tried to compress my buffer with this modification yet. I will ASAP.

BorisMansencal commented 7 years ago

Actually, setting new_capacity to UINT32_MAX does not work. Because when we call realloc, further in increase_capacity function, returned actual_size may be greater than desired_size=element_size*new_capacity. Then actual_size/element_size may not fit in an unsigned int.

I don't know how much bigger actual_size can be (it seems very implementation dependent according to malloc_usable_size man page). On my platform, if I set new_capacity to maximum UINT32_MAX-4096, I am able to compress my buffer.

So my code for elemental_vector looks like this:

bool elemental_vector::increase_capacity(uint min_new_capacity, bool grow_hint, uint element_size, object_mover pMover, bool nofail)
{
  [...]
      // new_capacity must be 64-bit when compiling on x64.
      size_t new_capacity = (size_t)min_new_capacity;
      if ((grow_hint) && (!math::is_power_of_2(static_cast<uint64>(new_capacity)))) {
         new_capacity = static_cast<uint>(math::next_pow2(static_cast<uint64>(new_capacity)));
         if (new_capacity < min_new_capacity) {
            new_capacity = UINT32_MAX-4096;
            if (new_capacity < min_new_capacity)
           return false;
         }
      }
      LZHAM_ASSERT(new_capacity && (new_capacity > m_capacity));
 [...]
}

Besides, I think the two following asserts could be added at the end of elemental_vector::increase_capacity(), just before the return true:

      LZHAM_ASSERT(min_new_capacity <= m_capacity );
      LZHAM_ASSERT(m_size <= m_capacity);