ludofischer / metalsmith-gzip

Metalsmith plugin for gzipping the output files.
MIT License
18 stars 3 forks source link

Compression levels? #3

Closed tigt closed 9 years ago

tigt commented 9 years ago

What gzip compression level does this plugin operate at? I tried checking zlib's page for any information, but I'm not sure I found the right one.

If I'm going to be zipping files only once, I might as well gzip at level 9 for a few extra percentages in savings / decode effort.

ludofischer commented 9 years ago

Excellent question! It is whatever the default level is, but I can’t find that in the Node documentation either. According to zlib documentation (http://www.zlib.net/manual.html) the default level they use is 6, so if whoever wrote the Node interface did not change it, that should be it.

Are you suggesting 9 based on personal experience? After ten seconds of research, I have found a benchmark showing that at level 9 the compression time gets multiplied by 6 but the output size changes just by a few percentage points http://tukaani.org/lzma/benchmarks.html .

Since somebody else probably wants to tweak compression levels, I would suggest we choose a sensible default and let people override it. The most work is switching to the Gzip class and using streams directly. Would you like to write the patch yourself (and enter the illustrious hall of fame of metalsmith-gzip contributors, enhancing your CV to the eyes of every Fortune 500 HR department out there)?

P.S. I like the goblin wearing pants to fit into society

tigt commented 9 years ago

If I were anywhere close to being a good enough programmer, I'd try it, but I finally understood object-oriented like, yesterday. I could try, but don't hold your breath on me accomplishing anything.

Code could be shared from beatgammit/gzip-js or jstuckey/gulp-gzip maybe? The official Node zlib documentation does have an options object which allegedly takes level, but doesn't seem keen on an example.

Also, yeah; level 9 compared to even level 8 is an big increase in encode time, but I'm not sure if that's a dealbreaker for me if I get to multiply those tiny savings across viewers. (NearlyFreeSpeech.NET & not much budget.)

Also, thanks! I didn't think people would be looking at that URL; I really should finish that theme so it doesn't look broken.

ludofischer commented 9 years ago

Ok, I thought you were a developer (at least you’re technical enough to care about compression levels). The best thing is to make them configurable. Looking at the Node documentation, we need to create a Gzip object with zlib.createGzip(options) and then use that to perform the compression.

The problem is that it seems to work with streams and in metalsmith we have the file contents as a buffer. https://github.com/beatgammit/gzip-js does not help because it reimplements the gzip algorithm from scratch in pure JS. It should not be exceedingly difficult to implement the changes. I can do it myself, but If you are learning and feel tempted to try, I can review your attempt and guide you.

tigt commented 9 years ago

Well, I'm terrified, but maybe that's a good sign. I have a fair understanding of JavaScript in-browser, but I'm pretty unfamiliar with the Node environment.

If I understand your suggestion, you mean something like:

var Gzip = zlib.createGzip(options);

Where options is an object defaulting to:

var options = {
    flush: zlib.Z_NO_FLUSH,
    chunkSize: 16*1024,
    windowBits: 15, /*not sure about this one*/
    level: 6,
    memLevel: 8,
    strategy: zlib.Z_DEFAULT_STRATEGY
};

I looked up streams vs. buffers and it seems like streams are what node.js uses if left to its own devices (some sort of string that doesn't mind having multiple things done to it at once), and buffers are when you specify it's some glob of binary data. The documentation mentions zlib.gzip(buf, callback) where I can only assume buf is the location of said buffer, and callback is some function that handles whatever this method returns. (Like, if zlib.gzip returned true it would log a happy message, if something went wrong it would return an error object?)

I'm currently reading over your existing code and trying to figure out where to integrate this, but this is going to take some cross-referencing before I have an idea (for starters, I had to look up !!whatever just now). It looks like maybe I would just attach some of the configurables to the existing options object? Is that a separate file?

ludofischer commented 9 years ago

Hey! Great that you‘re taking up the challenge! I seriously expected you would tell me to do it myself. You’re correct about using var Gzip = zlib.createGzip(options). From the documentation, it looks like we should then obtain as stream representing the file to compress and do

stream.pipe(Gzip).pipe(out)

for a suitable value of out.

This should replace the call to zlib.gzip(data.contents, function(err, buffer)). You’re correct about callback; but what gzip ‘returns’ is the compressed output, which the callback accesses in buffer.

The user defines the contents of the options object, they are not configured inside the plugin. There is no way to enforce at the language level that the options have certain fields defined, so what authors do is run checks before trying to access a certain property. Here we have to

  1. decide a name for the compression level option
  2. check if it is defined
  3. use that if yes
  4. use the defaults if no.

Writing a plugin can be confusing because the framework (metalsmith) is doing most of the work. They say ‘you call a library, a framework calls you’. So you have function arguments like options, file, metalsmith, done that seem to come from nowhere. But in fact they have been created by metalsmith, so you need to look at the metalsmith documentation to see what they contain.

The problem with streams vs buffers is that you cannot pass a buffer to a function requiring a stream and all that metalsmith is handing us down is a buffer, so we need to somehow get a buffer from the stream, then the stream back to the buffer (so that metalsmith and the other plugins down the chain can do something with it). You could look at the code inside zlib (since probably zlib.gzip is doing just that conversion) or use the second answer here http://stackoverflow.com/questions/16038705/how-to-wrap-a-buffer-as-a-stream2-readable-stream

I suggest you start by replacing line 24 in index.js and try to get the the compressed contents into the buffer variable by using streams, starting from the data buffer.

Nothing to be terrified about! You can’t break anything, As you probably noticed, writing this sort of code is more a question of philology than pure puzzle-solving ability, as you spend most of your time trawling through the documentation. For non-public code, the documentation is often much more lacking, in that case you spend hours inspecting the existing code to extract its intent.

tigt commented 9 years ago

I had been sending the email alerts to spam by accident. Sorry about that.

ludofischer commented 9 years ago

It turns out most of the suggestions to be found on the web on how to handle streams are probably innacurate, so I had to rewrite it from scratch. We could have salvaged your original code, but for no fault of yours it was totally the wrong approach.

tigt commented 9 years ago

If I'm seeing it right, the way to convert a stream into a buffer is to have the output stream from the zlib extension write to an array, then when it fires the end event you concatenate everything and "clone" it to the object you return to metalsmith?

ludofischer commented 9 years ago

Yes. Do you want to try whether it works for you? I think you can use this branch with npm instead of a regular release.

ludofischer commented 9 years ago

Release 0.3 has been pushed to npm. Compression levels can be set via the gzip key in the options object . For example: compress({ gzip: { level: 6 }}).

tigt commented 9 years ago

Large files can benefit from 17KB or more on gzip level 9

Sorry about my tardiness.

As you can probably see from the comparision above, this definitely ended up being worth it for larger files. Thank you so much for your patience and time! I'll have to come back after a few months of learning JS better and see if I can help contribute to something else of yours.

ludofischer commented 9 years ago

Excellent! I am glad that it works. I try to test it, but without others confirming with their own setups, it’s hard to know whether things work fine.

You’re welcome to try again, although I am not sure I have that many projects to contribute to,