kiyo-masui / bitshuffle

Filter for improving compression of typed binary data.
Other
219 stars 76 forks source link

Document on-disk representation of bitshuffled data #148

Open graeme-winter opened 11 months ago

graeme-winter commented 11 months ago

I got some way reverse-engineering the format so that I can do the bitshuffle independently of lz4 in my application but kept stubbing my toes - some clear documentation on how it is used would be very useful for non-canonical implementations.

For example: it would appear that the on disk representation takes the form of

BE uint32_t compressed_block_size <compressed block> BE uint32_t compressed_block_size <compressed block> BE uint32_t compressed_block_size <compressed block> ...

where <compressed_block> is the result of previously compressing 8192 bytes, then there is a partial block which is smaller, finally a (looks like) verbatim uncompressed teeny bit at the end which is some residual. I could try compressing and then unpacking arbitrary bit patterns to resolve this but it feels like some canonical definition of the on-disk format (beyond, of course, reading the source code) would be a useful addition to this library.

graeme-winter commented 11 months ago

I found a non-canonical implementation here

https://github.com/dectris/compression