ubjson / universal-binary-json

Community workspace for the Universal Binary JSON Specification.
115 stars 12 forks source link

Add support for a compressed container #11

Closed ghost closed 11 years ago

ghost commented 11 years ago

This is something I have been thinking about, direct support for a "compressed" container that indicates the payload inside is compressed with GZip (assuming most common and well support algorithm)

[C][GZ][i][34][...34 bytes of gzip compressed UBJSON...]

Usage could be at the root of a message such that the entire message is compressed, or around a subsection of the message, say for example a large grouping of string data.

A few notes and short comings...

This would require the stream-based processing to be potentially recursive as it would need to digest itself after each compressed block was found and decoded.

This could be a pain.

Also, allowing more than a single compression algorithm introduces compatibility issues. Using Snappy would be nice, but what about the 20 languages that don't support it?

Just wanted to externalize thes ideas and get feedback.

kxepal commented 11 years ago

This feature is very dangerous because it provides point of deviations and incompatibility realizations. Why so?

  1. GZip is not optimal choice: there is lightweight and fast Snappy, heavy LZMA, parallel BZip, universal ZIP and more others old good algorithms while more will come - time and progress doesn't stand still. While we couldn't support any of them this may lead to incompatibility of end points and generated data.
  2. Compression problem is out of UBJSON scope. Just recall what means tar.gz extension: streamed data(tar) within archived(gz) container. Providing local compression on format level we removing effectiveness of external compressors.
  3. Next feature request for such containers will be encryption, I'm sure. With compression this will be hellish mix of technologies.

I'm mostly against this feature because there are many other nice and clear ways to specify compression/encryption information. For example, take a look on HTTP headers:

For gzipped UBJSON I'll send next headers:

Content-Type: application/x-ubjson
Content-Encoding: gzip

For snappyed UBJSON I'll follow snappy format specification:

Content-Type: application/x-ubjson
Content-Encoding: x-snappy-framed

And similar headers for Transfer-Encoding: chunked one on sending streamed UBJSON data.

If we talking about compressed UBJSON data on disk, so the solution is similar to tar.gz case: db.ubjson.gz, db.ubjson.xz, db.ubjson.sz - explicitly defined compressor which will have right guessed mime type which prevents any additional compressing operations.

IMHO, this problem should be solved on protocol level, not data one.

ghost commented 11 years ago

Excellent; agreed on all accounts, this sounds like a horrible idea. Whoever suggested it should have their head examined! :)