richgel999 / lzham_codec

Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++
Other
693 stars 71 forks source link

decompress_memory fails on data compressed with compress_memory #3

Closed nemequ closed 9 years ago

nemequ commented 9 years ago

Maybe I'm doing something wrong, but I feel like this should get me LZHAM_DECOMP_STATUS_SUCCESS not LZHAM_DECOMP_STATUS_FAILED_BAD_CODE:

#define LOREM_IPSUM                                                     \
  "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vulputate " \
  "lectus nisl, vitae ultricies justo dictum nec. Vestibulum ante ipsum " \
  "primis in faucibus orci luctus et ultrices posuere cubilia Curae; "  \
  "Suspendisse suscipit quam a lectus adipiscing, sed tempor purus "    \
  "cursus. Vivamus id nulla eget elit eleifend molestie. Integer "      \
  "sollicitudin lorem enim, eu eleifend orci facilisis sed. Pellentesque " \
  "sodales luctus enim vel viverra. Cras interdum vel nisl in "         \
  "facilisis. Curabitur sollicitudin tortor vel congue "                \
  "auctor. Suspendisse egestas orci vitae neque placerat blandit.\n"    \
  "\n"                                                                  \
  "Aenean sed nisl ultricies, vulputate lorem a, suscipit nulla. Donec " \
  "egestas volutpat neque a eleifend. Nullam porta semper "             \
  "nunc. Pellentesque adipiscing molestie magna, quis pulvinar metus "  \
  "gravida sit amet. Vestibulum mollis et sapien eu posuere. Quisque "  \
  "tristique dignissim ante et aliquet. Phasellus vulputate condimentum " \
  "nulla in vulputate.\n"                                               \
  "\n"                                                                  \
  "Nullam volutpat tellus at nisi auctor, vitae mattis nibh viverra. Nunc " \
  "vitae lectus tristique, ultrices nibh quis, lobortis elit. Curabitur " \
  "at vestibulum nisi, nec facilisis ante. Nulla pharetra blandit lacus, " \
  "at sodales nulla placerat eget. Nulla congue varius tortor, sit amet " \
  "tempor est mattis nec. Praesent vitae tristique ipsum, rhoncus "     \
  "tristique lorem. Sed et erat tristique ligula accumsan fringilla eu in " \
  "urna. Donec dapibus hendrerit neque nec venenatis. In euismod sapien " \
  "ipsum, auctor consectetur mi dapibus hendrerit.\n"                   \
  "\n"                                                                  \
  "Phasellus sagittis rutrum velit, in sodales nibh imperdiet a. Integer " \
  "vitae arcu blandit nibh laoreet scelerisque eu sit amet eros. Aenean " \
  "odio felis, aliquam in eros at, ornare luctus magna. In semper "     \
  "tincidunt nunc, sollicitudin gravida nunc laoreet eu. Cras eu tempor " \
  "sapien, ut dignissim elit. Proin eleifend arcu tempus, semper erat et, " \
  "accumsan erat. Praesent vulputate diam mi, eget mollis leo "         \
  "pellentesque eget. Aliquam eu tortor posuere, posuere velit sed, "   \
  "suscipit eros. Nam eu leo vitae mauris condimentum lobortis non quis " \
  "mauris. Nulla venenatis fringilla urna nec venenatis. Nam eget velit " \
  "nulla. Proin ut malesuada felis. Suspendisse vitae nunc neque. Donec " \
  "faucibus tempor lacinia. Vivamus ac vulputate sapien, eget lacinia " \
  "nisl.\n"                                                             \
  "\n"                                                                  \
  "Curabitur eu dolor molestie, ullamcorper lorem quis, egestas "       \
  "urna. Suspendisse in arcu sed justo blandit condimentum. Ut auctor, " \
  "sem quis condimentum mattis, est purus pulvinar elit, quis viverra " \
  "nibh metus ac diam. Etiam aliquet est eu dui fermentum consequat. Cras " \
  "auctor diam eget bibendum sagittis. Aenean elementum purus sit amet " \
  "sem euismod, non varius felis dictum. Aliquam tempus pharetra ante a " \
  "sagittis. Curabitur ut urna felis. Etiam sed vulputate nisi. Praesent " \
  "at libero eleifend, sagittis quam a, varius sapien."
#define LOREM_IPSUM_LENGTH (sizeof(LOREM_IPSUM) - 1)

#include <stdio.h>
#include <lzham.h>
#include <stdint.h>
#include <assert.h>

int main (int argc, char** argv) {
  uint8_t compressed[4096];
  size_t compressed_length;
  uint8_t decompressed[1024 * 1024];
  size_t decompressed_length = sizeof(decompressed);

  lzham_compress_status_t comp_res;
  lzham_compress_params comp_params = {
    sizeof(lzham_compress_params),
    LZHAM_MAX_DICT_SIZE_LOG2_X86,
    LZHAM_COMP_LEVEL_DEFAULT,
    LZHAM_DEFAULT_TABLE_UPDATE_RATE,
    -1,
    0,
    0,
    NULL,
    64,
    64
  };

  lzham_decompress_status_t decomp_res;
  lzham_decompress_params decomp_params = {
    sizeof(lzham_decompress_params),
    LZHAM_MAX_DICT_SIZE_LOG2_X86,
    LZHAM_DEFAULT_TABLE_UPDATE_RATE,
    0,
    0,
    NULL,
    64,
    16
  };

  comp_res = lzham_compress_memory (&comp_params,
                    compressed, &compressed_length,
                    (const lzham_uint8*) LOREM_IPSUM, LOREM_IPSUM_LENGTH,
                    NULL);
  assert (comp_res == LZHAM_COMP_STATUS_SUCCESS);

  fprintf (stdout, "Compressed from %zu bytes to %zu bytes.\n", LOREM_IPSUM_LENGTH, compressed_length);

  decomp_res = lzham_decompress_memory (&decomp_params,
                    decompressed, &decompressed_length,
                    compressed, compressed_length,
                    NULL);
  fprintf (stdout, "Decompressed status = %d (LZHAM_DECOMP_STATUS_FAILED_BAD_CODE = %d)\n",
       decomp_res, LZHAM_DECOMP_STATUS_FAILED_BAD_CODE);

  return 0;
}
richgel999 commented 9 years ago

Thanks - investigating this now.

richgel999 commented 9 years ago

Ok, the problem is caused by some key decompression parameters not exactly matching the compression parameters. Apologies, this needs to be documented better. Like LZMA, I don't store the compression params in the stream itself to save a few bits. (Admittedly, the # of bits saved is tiny.)

Specifically, the m_table_max_update_interval and m_table_update_interval_slow_rate parameters in the comp/decomp parameters structure must exactly match during compression/decompression. In your example, the comp. struct uses 64,64, and the decomp struct uses 64,16. They must both match (64,64). Otherwise, the decompressor will eventually fall horribly out of sync with the compressor and it'll eventually run off the rails - that's why you get the bad code error.

For most uses, you should just set both values to 0,0 and instead set the m_table_update_rate param to a value between [LZHAM_SLOWEST_TABLE_UPDATE_RATE, LZHAM_FASTEST_TABLE_UPDATE_RATE]. This single value internally controls both m_table_max_update_interval/m_table_update_interval_slow_rate. These 2 parameters are there to allow the user to experiment to find the best settings for their specific data.

I changed your example so both are either 0,0 or 64,64 and decompression works OK (decomp_res is 3).

richgel999 commented 9 years ago

If I hadn't locked the bitstream for v1.0 I could have stored a small hash of the compression parameters, so the decompressor could immediately bail if it detected a mismatch. I was guessing most users wouldn't bother tweaking the defaults except for the dictionary size.

richgel999 commented 9 years ago

A bit more detail: if the codec sees that either m_table_max_update_interval or m_table_update_interval_slow_rate are non-zero, it uses the values you specify to control the Huffman table update rates. Otherwise, it uses the single m_table_update_rate value (which defaults to LZHAM_DEFAULT_TABLE_UPDATE_RATE).

Most users should set both m_table_max_update_interval/m_table_update_interval_slow_rate to 0 and not worry about them.

nemequ commented 9 years ago

Okay, that makes sense. For what it's worth, the reason I chose those parameters is because the documentation mentioned them being the default values (see lines 158 and 278). Perhaps you could update them to both be the same? If I were choosing non-default values I would have been much more suspicious of that when things went wrong.

richgel999 commented 9 years ago

Got it, I'll fix that.

richgel999 commented 9 years ago

I've changed the comments in lzham.h to be more clear.