seung-lab / fpzip

Cython bindings for fpzip, a floating point image compression algorithm.
BSD 3-Clause "New" or "Revised" License
34 stars 5 forks source link

fix: add order to compress parameters #12

Closed william-silversmith closed 5 years ago

william-silversmith commented 5 years ago

This PR attempts to address critiques raised in #11 and elsewhere in the zarr project. Multiple parties have found that transposition of an input array is required to achieve maximum compression. This is unfortunate as it requires a copy, additional compute time, and is not an obvious thing to do.

@lindstro noted that the default C order of numpy arrays means that the (depth, row, col) indexing of a 3D numpy is rendered such that the col is the fastest changing index in the underlying buffer. fpzip expects an XYZ orientation in the array, namely that X changes most rapidly. Since in this case, col is the most rapidly changing index, the inputs to fpzip should be X=col, Y=row, Z=depth rather than (as they are now) X=depth, Y=row, Z=col. This should result in fpzip processing longer stretches of correlated data.

However, we must also support the F orientation. In F order, the of the internal buffer is inverted. Therefore, depth will be the most rapidly changing index and we should set X=depth, Y=row, Z=col.

This becomes more interesting with the fourth index as fpzip treats it specially. x, y, and z are treated as a single compressible object, but the channel index nf describes how many compressible units there are. In C order, oddly, the channel parameter is the most frequently changing index so the buffer looks like RGB RGB RGB (for example). In F order, we'd treat the array more naturally as X=col, Y=row, Z=depth, W=channel. In C, we have X=channel, Y=depth, Z=row, W=col.

william-silversmith commented 5 years ago

We ran into an issue where our "kempression" protocol was not effective without this fix. Kempression consists of the following steps applied to data anisotropic in Z:

img = ...
data = 2.0 + np.swapaxes(subvol, 2,3)
return fpzip.compress(data) # fpzip.compress(data, order='F') after this patch

Unfortunately, np.swapaxes was creating a non-contiguous array which was being converted to C order on submission to fpzip.compress in version 1.1.2. This resulted in a far lower compression ration than was warranted. Applying this patch changed the compression ratio from 0.72x to 0.57x.