Support for compression

A key goal with this format is to minimize reads. If compression were supported, it would have to be pretty localized (e.g. compressing individual columns) because this would avoid impacting the number of reads.

Header compression is possible, but it would be a bit problematic. The ID would have to reflect the fact that it was compressed and the length information to proceeds each document couldn't be included in the compression (again...impact on reads).

Compression of columns is probably more likely to have a significant impact on storage space than compression of the header (which probably won't include a lot of repetitive data).

Any open question would be...what type of compression? We'd want to use something that is typically available as part of standard libraries. For Python, zlib and bz2 seem to be easily accessible. But what about the Java and C platforms?

mtiller / recon

Support for compression #7