pfalcon / uzlib

Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).
Other
303 stars 82 forks source link

Question about memory management in tgzip #35

Closed Christian-Sander closed 3 years ago

Christian-Sander commented 3 years ago

I'm currently looking into comrpressing some data and have two questions regarding the example implementation in tgzip and its usage in other programs:

  1. It appears as if tgzip.c doesn't free any memory and lets the OS clean up the allocated memory. If so, what is the user meant to free to avoid memory leaks? I'm thinking that I'm meant to free comp.out.outbuf and comp.hash_table, as those appear to be heap-allocated. Is there anything else I'm missing? As far as I can tell there is only one call internally to realloc for the comp.out.outbuf buffer.
  2. Does the dict_size need to be 32 kB, even if compressing smaller data and in memoy-limited environments? Does it correlate to the hash_bits field? I have a maximum of 1 MB of data that needs to be compressed, usually much less, and in PC-based tests using python gzip with max. compression level it compresses extremely well (1-0.1% of original size).
pfalcon commented 3 years ago

What I can say from my memory:

  1. The library itself doesn't do any memory allocation, it follows "dependency injection" pattern, where any buffers are allocated by the client and passed in as pointers.
  2. tgzip.c is a sample application which is intended to be simple, and thus may indeed rely on the behavior of a POSIX OS which guarantees that any resources allocated by a process will be freed on the process exit.
  3. dict_size is a DEFLATE/gzip param. hash_bits is a param of uzlib's compression algorithm. So, they're orthogonal params. Making both better configurable is a long-standing TODO task. For now you can patch the source.
  4. The compression quality of uzlib is not comparable to that of gzip. I specifically coded as simple as possible, and thus as small as realistically possible, algo. Though if hash_bits approaches infinity, the compression rate also approaches the highest possible for the LZ compression ;-).
Christian-Sander commented 3 years ago

Thank you for your replies. One comment to the first point:

  1. In defl_static.c: out->outbuf = sresize(out->outbuf, out->outsize, unsigned char); (sresize is a macro for reallloc). So there is actually one allocation happening, atleast when outbuf is not allocated before. This means we're both right!
pfalcon commented 3 years ago

Where do you see that called?

github-actions[bot] commented 3 years ago

Thanks for your submission. However there was no (further) activity on it for some time. Below are possible reasons and/or means to proceed further:

Thanks for your understanding!

github-actions[bot] commented 3 years ago

Closing due to inactivity.

smdjeff commented 2 years ago

I noticed the exact same and didn't see this one closed, so opened another. See #41. I laid out the call chain that is used to sneakily allocate memory.