richgel999 / lzham_codec

Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++
Other
693 stars 71 forks source link

Choose a standard dict size #14

Open nemequ opened 8 years ago

nemequ commented 8 years ago

AFAICT there is no guidance in LZHAM as to what the default dictionary size should be. Using 0 results in an error (instead of choosing a sensible default, as happens for most parameters).

The test program will use a dict_size_log2 of 28 on x86_64 and LZHAM_MAX_DICT_SIZE_LOG2_X86 (aka 26) on x86, but based on the comments in lzham.h:144, "The values of m_dict_size_log2, m_table_update_rate, m_table_max_update_interval, and m_table_update_interval_slow_rate MUST match during compression and decompression." Since 28 > LZHAM_MAX_DICT_SIZE_LOG2_X86, AFAICT it isn't possible to decompress something on x86 which has been compressed using the default parameters on x86_64.

lzham_lzcomp_internal.h seems to like a default value of 22.

I think lzham.h should define a LZHAM_DEFAULT_DICT_SIZE_LOG2, which should be <= LZHAM_MAX_DICT_SIZE_LOG2_X86 (somewhere around 22-24 seems reasonable, IMHO). That should then be used by lzhamtest regardless of architecture, unless a different value is specified on the command line.

richgel999 commented 8 years ago

Yes, this is definitely reasonable and I'll make this change in lzham_codec_devel (which will be v1.1).

jspohr commented 6 years ago

Hi, I have a question that regards the dict size, for which I didn't see enough need to open a separate issue. Under "Usage", the readme says:

Always try to use the smallest dictionary size that makes sense for the file or block you are compressing, i.e. don't use a 128MB dictionary for a 15KB file. The codec doesn't automatically choose for you because in streaming scenarios it has no idea how large the file or block will be. The larger the dictionary, the more RAM is required during compression and decompression. I would avoid using more than 8-16MB dictionaries on iOS.

Could you please clarify whether there are any downsides to a large dictionary, besides memory usage? If I understand correctly, unbuffered decompression does not allocate memory for the dictionary, and a PC with virtual memory doesn't commit the parts of the dictionary that the compressor doesn't write to. So in my scenario, where I compress individual game assets offline on PC (none of them larger than a few megs), and decompress unbuffered on device, can't I just use a large dictionary regardless of file size, and be done with it? Thanks in advance for your answer, it's highly appreciated!