Closed william-silversmith closed 5 years ago
We ran into an issue where our "kempression" protocol was not effective without this fix. Kempression consists of the following steps applied to data anisotropic in Z:
img = ...
data = 2.0 + np.swapaxes(subvol, 2,3)
return fpzip.compress(data) # fpzip.compress(data, order='F') after this patch
Unfortunately, np.swapaxes was creating a non-contiguous array which was being converted to C order on submission to fpzip.compress in version 1.1.2. This resulted in a far lower compression ration than was warranted. Applying this patch changed the compression ratio from 0.72x to 0.57x.
This PR attempts to address critiques raised in #11 and elsewhere in the zarr project. Multiple parties have found that transposition of an input array is required to achieve maximum compression. This is unfortunate as it requires a copy, additional compute time, and is not an obvious thing to do.
@lindstro noted that the default C order of numpy arrays means that the (depth, row, col) indexing of a 3D numpy is rendered such that the col is the fastest changing index in the underlying buffer. fpzip expects an XYZ orientation in the array, namely that X changes most rapidly. Since in this case, col is the most rapidly changing index, the inputs to fpzip should be X=col, Y=row, Z=depth rather than (as they are now) X=depth, Y=row, Z=col. This should result in fpzip processing longer stretches of correlated data.
However, we must also support the F orientation. In F order, the of the internal buffer is inverted. Therefore, depth will be the most rapidly changing index and we should set X=depth, Y=row, Z=col.
This becomes more interesting with the fourth index as fpzip treats it specially. x, y, and z are treated as a single compressible object, but the channel index
nf
describes how many compressible units there are. In C order, oddly, the channel parameter is the most frequently changing index so the buffer looks like RGB RGB RGB (for example). In F order, we'd treat the array more naturally as X=col, Y=row, Z=depth, W=channel. In C, we have X=channel, Y=depth, Z=row, W=col.