whatwg / compression

Compression Standard
https://compression.spec.whatwg.org/
Other
82 stars 21 forks source link

Support custom dictionaries #27

Open ricea opened 4 years ago

ricea commented 4 years ago

The "deflate" format supports preset dictionaries. These permit backreferences to be used from the start of the data to refer to items in the dictionary as if it was prepended to the uncompressed data. This can give significant improvements in compression ratio, particularly for small inputs. See FDICT in RFC1950. This is also a common feature in other compression formats.

This should be supported by CompressionStream and DecompressionStream.

For CompressionStream, an obvious API would be

const cs = new CompressionStream("deflate", { dictionary: aBufferSourceObject });

An open question is whether it is necessary to be able to pass multiple dictionaries to DecompressionStream (keyed by the Adler32 checksum), or whether just passing a single dictionary is sufficient. If we only support passing a single dictionary, this requires the calling code to either know by some out-of-band method what dictionary is in use, or parse the Adler32 checksum out of the header itself to choose the right dictionary.

mormahr commented 3 years ago

API responses with a fixed schema would be an obvious use case here. For example, one could generate a dictionary from a GraphQL schema and even weighted by the number of requests per field key.

ricea commented 3 years ago

@mormahr How do you feel about { dictionary: aBufferSourceObject } vs. { dictionaries: { 0x12345678: aBufferSourceObject } }

We could of course support both, but that would be ugly.

mormahr commented 3 years ago

I have no idea how all of that works. Since there isn't anything available in the browser (yet), I haven't looked further into my idea, so I don't really know what the proper API design is.