sergey-dryabzhinsky / python-zstd

Simple python bindings to Yann Collet ZSTD compression library
BSD 2-Clause "Simplified" License
165 stars 27 forks source link

Improve docs for zstd.compress #27

Closed rwarren closed 6 years ago

rwarren commented 6 years ago

The docstring for zstd.compress is currently:

Compress string, returning the compressed data.
Raises an exception if any error occurs.

This has a few issues:

  1. It does not specify that there are two arguments (string and level)
  2. It does not specify the possible options for level
  3. It doesn't say how illegal level values will be handled

For comparison, here is the zlib.compress docstring:

compress(string[, level]) -- Returned compressed string.

Optional arg level is the compression level, in 0-9. 

Following this lead, I suggest a zstd.compress docstring like this:

compress(string[, level]) -- Returned compressed string.

Optional arg level is the compression level, from 1 (fastest) to 22
(slowest). The default value is 3. Compression values outside the valid
range will be clipped to valid values.

I also want to add a few comments to that suggested docstring...

  1. Looking into your code I see that you seem to allow compression levels down to -5. Why is this?
    • I don't understand the reasoning here, which is primarily why this is an Issue vs a PR
  2. My personal opinion is that incorrect levels should raise a ValueError, rather than silently clipping.
    • Clipping can be confusing (especially with the absence of valid level ranges in the docstring)
    • The Zstandard implementation uses -1 through -22, and I wasn't sure whether your implementation took the negatives literally or not, so I needed to dig into your C code. A ValueError would have immediately clarified when I invoked incorrectly.
    • The docstring could be built to include ZSTD_MAX_CLEVEL and ZSTD_MIN_CLEVEL as "write-less-code-later insurance" against this range every getting expanded
    • If ValueError were raised, the clipping statement in the suggested docstring could just be removed.
sergey-dryabzhinsky commented 6 years ago
  1. Ok, I could update docstring.
  2. Negative values available since Zstd 1.3.4. It is 'extra-fast' levels: https://github.com/facebook/zstd/releases/tag/v1.3.4
  3. I think there Zstd.Error can be used. As do zlib:
    >>> import zlib
    >>> zlib.compress("123", 22)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    zlib.error: Bad compression level

The Zstandard implementation uses -1 through -22

Nope. Zstd uses 1 through 22 levels, and ultra-fast levels -1..-5 (--fast param). But in console tool -1..-22 is the tool param.

sergey-dryabzhinsky commented 6 years ago

@rwarren Can you look into #28 pull? Is it ok to you?

rwarren commented 6 years ago

PR #28 looks great to me. Good API, good docs, and good comments. The clarity on compression levels (fastest and slowest) is great, as is the addition of the ultra-fast levels documentation when compiled in.

I only have two minor suggestions:

  1. Clarify the ultrafast level ordering
    • For example: `Also supports ultra-fast levels from -5 (fastest) to -1 (less fast) since module compiled with ZSTD 1.3.4+."
  2. Change Raises an zstd.Error exception if any error occurs to Raises zstd.Error on any error, or Raises a zstd.Error exception if any error occurs.
    • just a minor English nitpick
sergey-dryabzhinsky commented 6 years ago

@rwarren Got it. Updated pull request.

sergey-dryabzhinsky commented 6 years ago

@rwarren Any other suggestions?

rwarren commented 6 years ago

@sergey-dryabzhinsky Sorry - I thought I'd responded earlier, but apparently not! I did review your changes when you made them and they looked good to me.

i.e. I've got no more suggestions. :)

sergey-dryabzhinsky commented 6 years ago

Ok. Merged.