madler / zlib

A massively spiffy yet delicately unobtrusive compression library.
http://zlib.net/
Other
5.55k stars 2.42k forks source link

Manual issue on "strategy" parameter #910

Closed MasterInQuestion closed 6 months ago

MasterInQuestion commented 7 months ago

    https://www.zlib.net/manual.html#Advanced     The description on "strategy" parameter:     "The effect of Z_FILTERED is to force more Huffman coding and less string matching; it is somewhat intermediate between Z_DEFAULT_STRATEGY and Z_HUFFMAN_ONLY."     .     Potentially misleading. (than what?)     [ May trap permissive parsing: "The effect of Z_HUFFMAN_ONLY is to force more Huffman coding and less string matching ..." ]

    Recommended change: [[

<p id="strategy" style="white-space: pre-wrap; word-wrap: break-word"
>   The <tt>strategy</tt> parameter is used to tune the compression algorithm:
    |0| Use the value <tt>Z_DEFAULT_STRATEGY</tt> for normal data.
    |1| <tt>Z_FILTERED</tt> for data produced by a filter (or predictor).
    |2| <tt>Z_HUFFMAN_ONLY</tt> to force Huffman encoding only (no string match).
    |3| <tt>Z_RLE</tt> to limit match distances to 1 (Run-Length Encoding).
    |4| <tt>Z_FIXED</tt> prevents the use of dynamic Huffman codes. [ Allowing simpler decoder for certain applications. ]

    Filtered data consist mostly of small values with somewhat random distribution. (as produced by PNG filters)
    In this case, the compression algorithm may be tuned to compress them better.

    The effect of <tt>Z_FILTERED</tt> is to force more Huffman coding and less string matching than <tt>Z_DEFAULT_STRATEGY</tt>: somewhat the intermediate between <tt>Z_DEFAULT_STRATEGY</tt> and <tt>Z_HUFFMAN_ONLY</tt>.
    <tt>Z_RLE</tt> is designed to be almost as fast as <tt>Z_HUFFMAN_ONLY</tt>, but generally gives better compression for PNG image data.

    The <tt>strategy</tt> parameter affects only the compression ratio not the correctness of compressed output: even if set inappropriately.
</p>

]]     Per [ https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes ], and some testing: "id" should be used instead of "name" for HTML anchor. ("name" mostly doesn't work)     "id" also doesn't have to be unique: see [ https://github.com/MasterInQuestion/Markup/blob/main/!clock.htm ] for example.     .     The wrapper is derived from [ https://github.com/MasterInQuestion/Markup/blob/main/Verbose.htm ].

    See also:     https://github.com/madler/zlib/blob/36e369e1a54b35a978dc584496af69a07ec2d71a/zlib.h#L588-L601     (plain text identical) [[     The strategy parameter is used to tune the compression algorithm:     |0| Use the value Z_DEFAULT_STRATEGY for normal data.     |1| Z_FILTERED for data produced by a filter (or predictor).     |2| Z_HUFFMAN_ONLY to force Huffman encoding only (no string match).     |3| Z_RLE to limit match distances to 1 (Run-Length Encoding).     |4| Z_FIXED prevents the use of dynamic Huffman codes. [ Allowing simpler decoder for certain applications. ]

    Filtered data consist mostly of small values with somewhat random distribution. (as produced by PNG filters)     In this case, the compression algorithm may be tuned to compress them better.

    The effect of Z_FILTERED is to force more Huffman coding and less string matching than Z_DEFAULT_STRATEGY: somewhat the intermediate between Z_DEFAULT_STRATEGY and Z_HUFFMAN_ONLY.     Z_RLE is designed to be almost as fast as Z_HUFFMAN_ONLY, but generally gives better compression for PNG image data.

    The strategy parameter affects only the compression ratio not the correctness of compressed output: even if set inappropriately. ]]

    Also, though I haven't verified the exact implementation: but empirically Z_FILTERED feels like a superset of Z_HUFFMAN_ONLY?     Output size of Z_HUFFMAN_ONLY was never smaller than Z_DEFAULT_STRATEGY or Z_FILTERED?

    "Z_RLE is designed to be almost as fast as Z_HUFFMAN_ONLY, but gives better compression for PNG image data." <^>    "gives" (guaranteed) or "may give"? or "generally gives"?

    Are Z_HUFFMAN_ONLY, Z_RLE, Z_FIXED all capped versions of Z_FILTERED? (subset of Z_FILTERED?)

madler commented 6 months ago

I have expanded on the description in zlib.h. https://github.com/madler/zlib/commit/96d3e9e3dde39aaab732089495400ba59e30ad5f

In terms of "supersets", all but Z_FIXED are degrees of string matching from most to none at all, in the order: Z_DEFAULT_STRATEGY, Z_FILTERED, Z_RLE, then Z_HUFFMAN. Z_FIXED is orthogonal to those, using the same string matching strategy as Z_DEFAULT_STRATEGY, but only producing fixed blocks, never dynamic blocks.

MasterInQuestion commented 6 months ago

    The diff looks dumbfounding...

    [ "diff.webp" ]

    I have updated some changes in the previous post accordingly.

    So, all them are same algorithm with different tuning: for different degrees of string matching?     With Z_FIXED, Z_DEFAULT_STRATEGY sharing same degree of string matching: but prevented the use of dynamic Huffman codes?

madler commented 6 months ago

That's what it says, except it doesn't say the same algorithm. The Huffman-only is no string matching at all, so the null algorithm, and run-length-encoding is only that, which is it's own code. Z_FILTERED is an adjustment to the normal string-matching algorithm.

MasterInQuestion commented 6 months ago

    So all the same base algorithm essentially?

madler commented 6 months ago

I just said no. The Z_RLE algorithm is different and much simpler, and there is no string-matching algorithm at all for Z_HUFFMAN.

MasterInQuestion commented 6 months ago

    The problem is likely not simple. I shall investigate further before further comment.     Thanks for the assistance.

MasterInQuestion commented 2 months ago

    "So all the same base algorithm essentially?" <^>    Must it be: indeed all Lempel-Ziv based... (or precisely: Deflate)

    These are different tuning of compression strategies that affect the efficiency and compressibility.

madler commented 2 months ago

"So all the same base algorithm essentially?" <^>    Must it be: indeed all Lempel-Ziv based... (or precisely: Deflate)

Deflate is not an algorithm, but rather a definition of a representation of data. There are many algorithms that could be written to generate the same Deflate representation. Just like there are many sorting algorithms, all with the exact same end result.

MasterInQuestion commented 2 months ago

    Somewhat yes.     Deflate is also commonly regarded as a compression algorithm, regardless.

madler commented 2 months ago

Entirely yes. Those regarding it as such are incorrect.

MasterInQuestion commented 2 months ago

    I guess this is where such "strategy" tuning come into place...