whatwg / compression

Compression Standard
https://compression.spec.whatwg.org/
Other
82 stars 21 forks source link

Supply parameters for compression algorithms #22

Open yoavweiss opened 4 years ago

yoavweiss commented 4 years ago

Even for built in algorithms (gzip and deflate), there are various parameters that users can supply which can unlock some use cases. Examples:

ricea commented 4 years ago

I am expecting to add an options bundle as the second argument to the constructor. So for example we'd have something like

new CompressionStream('deflate', {
    level: 0.1,
    flush: 'always'
});

How the parameters work will be difficult to fix if we get it wrong, which is why they aren't in the initial version of the standard.

yoavweiss commented 4 years ago

That makes perfect sense, thanks! :)

chris-morgan commented 4 years ago

I imagine that will also include the use of a custom dictionary? At present for Fastmail, we use pako in a worker to compress our API request bodies, and use a custom dictionary because it makes the compression much more effective. I had hoped that we could look to switching it to a standard API that would compress faster with less code loaded.

Given the already-niche status of manual compression in JavaScript (for web systems specifically, I personally can’t think of having heard of even one other user, though doubtless some exist), I was a little surprised to hear of this shipping in Chrome without support for varying the compression level or providing a custom dictionary—it’s so rare that people want to do manual compression that I’d guess that a fair fraction of those that do use it have tuned things carefully, and so will not be able to use this new thing at this time without altering the balance.

My surprise is probably because I believe supporting at least those two parameters (level and dictionary) to be quite straightforward, with a broad approach (an options object to the constructor) being obvious, and individual options decisions that should just be made, where discussion is unlikely to affect matters. For starters, I take it as given that the available options depend wholly on the compression method selected.

level could reasonably be an enum, an integer or a float in the range 0–1; given JavaScript and given conventions of extant compression software, probably an integer. Its reasonable range could be 0–3 (matching FLEVEL) or 1–9 (matching most software). What the default is also varies—for FLEVEL’s 0–3, 2 is defined as the default; for compression tools with a level 1–9, some default to 8 (e.g. the zlib library) and others to 6 (e.g. gzip(1)). These numbers are, of course, fairly arbitrary anyway. You could then either leave the default unspecified, or pick 6 or 8 and run with it.

For compression, dictionary should probably be an String containing only ASCII, an ArrayBuffer or a Uint8Array. For decompression, you could wish to provide more than one dictionary, so perhaps dictionary (or dictionaries?) would be an object mapping Adler-32 to dictionary.

ricea commented 4 years ago

I imagine that will also include the use of a custom dictionary?

Yes, that's on the roadmap, although I don't think I've specifically mentioned it here. I filed issue #27 to make it explicit.

I was a little surprised to hear of this shipping in Chrome without support for varying the compression level or providing a custom dictionary

I believe in shipping the most uncontroversial parts of an API first. We need to assess demand to set the priority for shipping more advanced features.

individual options decisions that should just be made, where discussion is unlikely to affect matters.

Discussion does affect matters. You gave 4 different approaches to level yourself. Someone else may have another suggestion which copes well with libdeflate having levels all the way up to 12, or zopfli providing a higher level of compression which is extraordinarily expensive.

noell commented 4 years ago

The parameter should be a float in [0..1], similar to toDataURL, toBlob, imho.

Mapping to internal compression levels, like say libdeflate 12 for example, should be an unspecified internal implementation detail [1].

[1] toDataURL, toBlob are spec'd that way. How their parameters are mapped to the internal details of a codec is not in the spec. That was intentional -- it allows browser vendors some wiggle-room to choose what's best for their underlying implementations.

ricea commented 4 years ago

@noell I didn't know about the quality argument to toDataURL and toBlob. That's a good precedent.

I feel there should be some kind of restriction on implementations. For example, level: 0.1 should use less CPU than level: 0.9, and the difference between 0.8 and 1.0 shouldn't be more than a factor of 2. The reason being that code that performs well in one browser shouldn't perform badly in another browser.

jasnell commented 2 years ago

In addition to options for the compression algorithm, it would be good to be able to set the Queuing Strategies for these as well following the same approach as the TransformStream constructor.