julik / zip_kit

Compact ZIP file writing/reading for Ruby, for streaming applications
MIT License
49 stars 5 forks source link

Compress level #9

Closed prtngn closed 7 months ago

prtngn commented 7 months ago

Hi. Can I somehow set the archive compression level?

julik commented 7 months ago

There is no setting for it right now, I would accept a PR though. If I remember well when I built zip_tricks I found that the compression level would influence how often the zlib deflater flushes the internal buffer, and with large files there would be severe memory bloat - but I might be mistaken. Open to revisiting this (but the new parameter would need to be passed through a number of methods then - write_deflated_file, write_file and so on).

If you need a short-term fix you can monkeypatch the initializer of ZipKit::Streamer::DeflatedWriter - this is where the compression level gets set.

julik commented 7 months ago

To expand on that, I've made a following test (and a respective change in the code):

sizes_per_level = (-1..9).map do |level|
  sizer = ZipKit::WriteAndTell.new(ZipKit::NullWriter)
  ZipKit::Streamer.open(sizer) do |zip|
    zip.write_deflated_file("comp.txt", compression_level: level) do |io|
      (1024 * 512).times { io << repeating_string }
    end
  end
  sizer.tell
end

These are the available values for the compression level. What I am getting is this (given that the input is "perfectly" compressible):

[59722, 20450526, 119170, 119170, 119170, 59723, 59722, 59722, 59722, 59722, 59722]

So the default compression level already seems to be giving the best size. The second level (0) is actually NO_COMPRESSION. The levels following are speed-optimized and give worse results. Given that deflate compression is very fast (it should even be AVX-accelerated on most modern hardware, zlib is very good and old too!) - is this really needed? I am not objecting adding the extra option, but it does look like it would not provide as much value as one could hope.

Or do you have specific use cases in mind/a specific level you want to use?

prtngn commented 7 months ago

Thanks for your reply. Yes, I think in this case it is not very relevant.