ZFP Compression - Githubissues

rabernat commented 5 years ago

I just learned about a new compression library called ZFP: https://github.com/LLNL/zfp

zfp is an open source C/C++ library for compressed numerical arrays that support high throughput read and write random access. zfp also supports streaming compression of integer and floating-point data, e.g., for applications that read and write large data sets to and from disk.

zfp was developed at Lawrence Livermore National Laboratory and is loosely based on the algorithm described in the following paper:
Peter Lindstrom
"Fixed-Rate Compressed Floating-Point Arrays"
IEEE Transactions on Visualization and Computer Graphics
20(12):2674-2683, December 2014
doi:10.1109/TVCG.2014.2346458
zfp was originally designed for floating-point arrays only, but has been extended to also support integer data, and could for instance be used to compress images and quantized volumetric data. To achieve high compression ratios, zfp uses lossy but optionally error-bounded compression. Although bit-for-bit lossless compression of floating-point data is not always possible, zfp is usually accurate to within machine epsilon in near-lossless mode.

zfp works best for 2D and 3D arrays that exhibit spatial correlation, such as continuous fields from physics simulations, images, regularly sampled terrain surfaces, etc. Although zfp also provides a 1D array class that can be used for 1D signals such as audio, or even unstructured floating-point streams, the compression scheme has not been well optimized for this use case, and rate and quality may not be competitive with floating-point compressors designed specifically for 1D streams.

zfp is freely available as open source under a BSD license, as outlined in the file 'LICENSE'. For more information on zfp and comparisons with other compressors, please see the zfp website. For questions, comments, requests, and bug reports, please contact Peter Lindstrom.

It would be excellent to add ZFP compression to Zarr! What would be the best path towards this? Could it be added to numcodecs?

jhamman commented 5 years ago

It looks like there are already some python/numpy bindings: https://github.com/seung-lab/fpzip

alimanfoo commented 5 years ago

Hi Ryan, to implement a new codec you just need to implement the numcodecs.abc.Codec interface:

https://numcodecs.readthedocs.io/en/latest/abc.html

...then register your new codec class with a call to register_codec():

https://numcodecs.readthedocs.io/en/latest/registry.html#numcodecs.registry.register_codec

If you want to just try this out as an experiment, you could just knock up a codec implementation in a notebook or wherever, using the Python bindings for zfp the codec implementation would be very simple. I'd suggest getting some benchmark results showing useful speed and/or compression ratio, then consider adding to numcodecs if it looks promising.

If this did look a useful addition to numcodecs, couple of things I noticed from a quick glance at the source for the Python bindings. First it expects arrays of at least 3 and at most 4 dimensions. Don't know if this is a constraint might be good to relax. Also it converts data to Fortran order before compression, which will mean an extra data copy for most users where data is usually in C order. This may be a hard requirement of the zfp library, and so unavoidable, but just something to note.

On Mon, 15 Oct 2018, 16:57 Joe Hamman, notifications@github.com wrote:

It looks like there are already some python/numpy bindings: https://github.com/seung-lab/fpzip

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/307#issuecomment-429911840, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QkUZrTh_hJbm34GeyCQNcHiCPuV3ks5ulLCGgaJpZM4XceDs .

rabernat commented 5 years ago

Digging deeper, it looks like fpzip does not support missing data: https://zfp.readthedocs.io/en/release0.5.4/directions.html#directions

This is a dealbreaker for now (at least for me), but it looks like it might be addressed soon.

jakirkham commented 5 years ago

How do you handle missing data with Zarr currently?

william-silversmith commented 5 years ago

@rabernat I'm the author of the fpzip python bindings. I think you may be conflating zfp and fpzip. fpzip is a lossless codec while zfp isn't. Try the following:

import fpzip
import numpy as np

x = np.array([[[ np.nan ]]], dtype=np.float32)
y = fpzip.compress(x)
z = fpzip.decompress(y)
print(z)

Worth noting that fpzip is for 3D and 4D data though 1D and 2D data compression is supported via adding dimensions of size 1.

william-silversmith commented 5 years ago

Also, if you guys find fpzip interesting, I'd be happy to add support for more flexible support for Fortran vs C order. I don't think it's a hard constraint, it's just what I got working for my use case.

UPDATE: fpzip 1.1.0 respects C order arrays.

rabernat commented 5 years ago

I think you may be conflating zfp and fpzip. fpzip is a lossless codec while zfp isn't.

@william-silversmith, thanks for clearing this up. I'm going to blame the mix-up on @jhamman with this comment.

Thanks also for writing fpzip and sharing it with the community!

I did some benchmarking of fpzip vs. zstd on real ocean data. You can find my full analysis in the this notebook. The results are summarized by these two figures, which compare the compression ratio vs. encoding / decoding time of the two codecs on two different ocean datasets, one with a land mask (i.e. nan's) over about 43% of the domain and one without.

encoding:

decoding:

It's interesting to note how zstd finds the missing data (encoded as nans) immediately; the compression ratios on the data with the mask are much higher. Since fpzip doesn't allow nans, I just filled with zeros. With fpzip, there are only very minor compression ratio differences between the masked and the unmaksed arrays.

Based on this analysis, in terms of decoding time (which is my main interest), fpzip is nowhere close to zstd. To get fpzip to speed up encoding or decoding, I have to go out to precisions of < 20, which results in acceptable losses:

The caveat is that this is all done on my laptop, so might not be very robust or representative. Please have a look at my notebook and see if I have done anything obvious wrong. (It should be fully runnable from anywhere.)

rabernat commented 5 years ago

Also, in case this was not clear, I am basically not getting any compression of either dataset with fpzip using the default precision:

>>> len(fpzip.compress(data_sst))/data_sst.nbytes
1.0000054012345678
>>> len(fpzip.compress(data_ssh))/data_ssh.nbytes
1.0000054012345678

Am I doing something wrong?

william-silversmith commented 5 years ago

Hi Ryan!

Thanks for the info. I'm glad you found it useful to evaluate fpzip! I think Dr. Lindstrom, whom I've corresponded with, would be happy more people are looking at it.

I have a few comments that I'll just put into bullets:

fpzip does support NaNs, so you can use your land mask as-is. It's zfp that doesn't support missing values.
I think fpzips efficacy depends a lot on the dataset. You might very well be justified in using zstd on your dataset. For my lab's 4D data (X,Y,Z, channel) generated during boundary detection on a 3D biomedical image, we saw gzip compress to 65%, zstd compress to 59%, and fpzip compress to 46% losslessly. My colleague, Nico Kemnitz, found that with some manipulations that cause a loss of one machine epsilon, we can compress to 56%, 50%, and 32% respectively on our benchmark. You can read more here: https://github.com/seung-lab/cloud-volume/wiki/Advanced-Topic:-fpzip-and-kempressed-Encodings
You're correct that fpzip's decompress time is greater than zstd. I find that it has often near symmetric compression and decompression rates. It tends to win by a lot on compression and lose by a bit on decompression compared with gzip and zstd.
The use case my lab is concerned with is that we are generating (without compression) 5 PB of floating point data. So we were less concerned with decode time than simply getting things crunched down.

I did a small test of some of the pangeo data using the following script and I think I must have injected a bug. I think the reason there's so little compression is because the tail of the compressed data are all zeros. Let me investigate this.... Apologies, this python library is pretty new and we haven't put it into production use yet so it's not 100% battle tested.

import fpzip
import numpy as np
import gcsfs
import pandas as pd
import xarray as xr

gcs = gcsfs.GCSFileSystem(project='pangeo-181919', token='anon')
# ds_ssh = xr.open_zarr(gcsfs.GCSMap('pangeo-data/dataset-duacs-rep-global-merged-allsat-phy-l4-v3-alt',
#                                gcs=gcs))

ds_llc_sst = xr.open_zarr(gcsfs.GCSMap('pangeo-data/llc4320_surface/SST',
                               gcs=gcs), auto_chunk=False)
data = ds_llc_sst.SST[:5, 1, 800:800+720, 1350:1350+1440].values

x = fpzip.compress(data)
print(x)

william-silversmith commented 5 years ago

I just updated fpzip to version 1.1.1 that should have the trailing zeros problem fixed. Give it a try again!

EDIT: I did a quick test and I saw the following results for the above script:

20736000 bytes raw 
13081082 bytes fpzip (63%)
17396243 bytes gzip -9 (84%)

lindstro commented 5 years ago

@rabernat, I'm the developer of fpzip (and zfp). Having worked a lot with climate scientists, I would certainly expect high-resolution ocean data like yours to compress quite well (definitely to less than 100%!). @william-silversmith, who has done a great job adding a Python interface to fpzip, reports a reduction to 63% after his latest fix, which seems more plausible. Perhaps you can give fpzip another try?

If you're willing to sacrifice some loss to within some acceptable error tolerance, then I would suggest checking zfp out. It's a far faster and more advanced compressor than fpzip.

rabernat commented 5 years ago

Hi @lindstro -- thanks for droping in here. And most of all, thanks for your important work on compression! As a computational oceanographer, I recognize that this is really crucial for the future sustainability of high-resolution modeling.

I updated my analysis with the latest version of fpzip. Indeed it actually compresses the data now! 😉

Here is a figure which summarizes the important results. In contrast to my previous figure, I have eliminated the precision argument for fpzip and am using it as intended: lossless mode (directly comparable to zstd).

In terms of compression ratio, fpzip beats zstd when there is no land mask (holds regardless of the zstd level). With a land mask, zstd is able to do a bit better. zstd is a lot faster, particularly on decoding, but that is not a dealbreaker for us. (50 MB/s is still fast compared to network transfer times.)

Based on this analysis, I think we definitely want to add fpzip support to numcodecs.

If you're willing to sacrifice some loss to within some acceptable error tolerance, then I would suggest checking zfp out. It's a far faster and more advanced compressor than fpzip.

Yes we would love to try zfp. Is there a python wrapper for it yet?

rabernat commented 5 years ago

Another point about fpzip: compression ratio is considerably higher (0.4 vs 0.55) if I transpose the arrays from their native python / C order to Fortran order when I feed them in. @william-silversmith -- I notice you have added the order keyword to fpzip.decompress but not to fpzip.compress. Would it be possible to add support for compression of C-order arrays? This would allow us to avoid the manual transpose step at the numcodecs level.

rabernat commented 5 years ago

@alimanfoo - presuming we want to move forward with adding fpzip and zfp to numcodecs, what would be the best path? Would you want to use @william-silversmith's python package, or would you want to re-wrap the C code within numcodecs, as currently done for c-blosc? The latter approach would also allow us to use zfp without an independent python implementation. But it is a lot more work. Certainly not something I can volunteer for.

alimanfoo commented 5 years ago

If there are existing python wrappers then I'd suggest to use those, at least as a first pass - can always optimise later if there is room for improvement. PRs to numcodecs for fpzip and zfp would be welcome.

On Wed, 7 Nov 2018, 15:23 Ryan Abernathey <notifications@github.com wrote:

@alimanfoo https://github.com/alimanfoo - presuming we want to move forward with adding fpzip and zfp to numcodecs, what would be the best path. Would you want to use @william-silversmith https://github.com/william-silversmith's python package, or would you want to re-wrap the C code within numcodecs, as currently done for c-blosc? The latter approach would also allow us to use zfp without an independent python implementation. But it is a lot more work. Certainly not something I can volunteer for.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/307#issuecomment-436660574, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QqIqooHdlM0F2h0PHbVDubZlr1kqks5usvsOgaJpZM4XceDs .

lindstro commented 5 years ago

@rabernat, thanks fore rerunning your compression study. fpzip was developed back in 2005-2006, when 5 MB/s was fast. I've been meaning to rewrite and parallelize it, but I've got my hands full with zfp development.

Regarding transposition, clearly we'd want to avoid making a copy. fpzip and zfp both use the convention that x varies faster than y, which varies faster than z. So an array of size nx ny nz should have a C layout

float array[nz][ny][nx];

@william-silversmith's Python wrappers should ideally do nothing but pass the array dimensions to fpzip in the "correct" order and not physically move any data. zfp (but not fpzip) also supports strided access to handle arrays of structs without having to make a copy.

As far as Python wrappers for zfp, we're just now starting on that and have an end-of-the-year milestone to deliver such a capability. The developer working on this has suggested using cffi. As I'm virtually Python illiterate, I'm not sure how that would interact with numcodecs. Any suggestions are welcome.

alimanfoo commented 5 years ago

For numcodecs it doesn't matter how you wrap zfp. As long as the zfp python module provides a compress() function that accepts any python object that implements the buffer protocol and returns a python object that implements the buffer protocol (e.g., bytes), and similar for decompress(). And in both cases minimises memory copies, i.e., use the buffer protocol to read data directly from the buffers exposed by the python objects.

On Wed, 7 Nov 2018, 16:51 Peter Lindstrom <notifications@github.com wrote:

@rabernat https://github.com/rabernat, thanks fore rerunning your compression study. fpzip was developed back in 2005-2006, when 5 MB/s was fast. I've been meaning to rewrite and parallelize it, but I've got my hands full with zfp development.

Regarding transposition, clearly we'd want to avoid making a copy. fpzip and zfp both use the convention that x varies faster than y, which varies faster than z. So an array of size nx ny nz should have a C layout

float array[nz][ny][nx];

@william-silversmith https://github.com/william-silversmith's Python wrappers should ideally do nothing but pass the array dimensions to fpzip in the "correct" order and not physically move any data. zfp (but not fpzip) also supports strided access to handle arrays of structs without having to make a copy.

As far as Python wrappers for zfp, we're just now starting on that and have an end-of-the-year milestone to deliver such a capability. The developer working on this has suggested using cffi. As I'm virtually Python illiterate, I'm not sure how that would interact with numcodecs. Any suggestions are welcome.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/307#issuecomment-436694477, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QgLIbCWX8I2K6mObZKt9Mw_Ucum-ks5usw-YgaJpZM4XceDs .

rabernat commented 5 years ago

Ok, so we are clearly in a "help wanted" situation.

Implementing the necessary wrappers / documentation / tests for fpzip and zfp in numcodecs would make an excellent student / intern project. It is a clearly defined task with plenty of examples to draw from. And it will have a big impact by bringing these codecs to a broader community via zarr. So if anyone knows of any students or other suitable candidates looking to get their feet wet in open source, please send them to this issue!

Also, I'm transferring this to the numcodecs issue tracker, where it clearly belongs.

alimanfoo commented 5 years ago

Just to elaborate a little, what we would like to do is implement a codec class for ZFP. The Codec interface is defined here. Once there is a Python package binding ZFP, then the codec implementation is very simple, basically the Codec.encode(buf) method implementation would just pass through to a zfp.compress(buf) function, and similarly the Codec.decode(buf, out=None) method would ideally just pass through to a zfp.decompress(buf, out) function.

There is a detail on the decode method in that numcodecs supports an out argument which can be used by compressors that have the ability to decompress directly into an existing buffer. This potentially means that decompressing involves zero memory copies. So if a zfp package offered this ability to decompress directly into a buffer exposed by a Python object via the buffer interface, this could add an additional optimisation. However, most compression libraries don't offer this, so it is not essential. I.e., if zfp does not offer this, then in numcodecs if an out argument is provided, we just do a memory copy into out. For an example, see the zlib codec implementation.

One other detail, ideally a zfp Python binding would be able to accept any object exposing the Python buffer interface. We currently also require codecs to be able to handle array.array which in Python 2 needs a special case because it doesn't implement the buffer interface. But we can work around that inside numcodecs. I.e., it is fine if a zfp binding just uses the buffer interface, no special case needed for array.array in Python 2. E.g., this is part of the reason why there is some special case code for Python 2 within the zlib codec implementation.

Hth.

jakirkham commented 5 years ago

We currently also require codecs to be able to handle array.array which in Python 2 needs a special case because it doesn't implement the buffer interface.

We can smooth over this a bit. Opened PR ( https://github.com/zarr-developers/numcodecs/pull/119 ) as a proof-of-concept.

lindstro commented 5 years ago

@alimanfoo, zfp does indeed decompress directly into a user-allocated buffer. Similarly, compression writes to a user-allocated buffer whose size zfp conservatively estimates for the user. Alternatively, the user can specify how much memory to use for the compressed data, and zfp will ensure that the compressed data fits within the buffer (with quality commensurate with buffer size).

I know very little about zarr, but my understanding is that it partitions arrays into equal-shape chunks whose compressed storage is of variable length. I'm guessing chunks have to be large enough to amortize the overhead of compression metadata (I see 1 MB or so recommended). zfp provides similar functionality but uses very small chunks (4^d values for d-dimensional arrays) that are each (lossily) compressed to a fixed number of bits per chunk (the user specifies how many bits; for 3D arrays, 1024 bits is common). I wonder if this capability could be exploited in zarr without having to rely on zfp's streaming compression interface.

jakirkham commented 5 years ago

We've revamped how codecs handle data internally. Outlined in this comment. This should make it much easier to contribute new codecs. Would be great if those interested took a look and provided any feedback.

alimanfoo commented 5 years ago

@lindstro thanks for the information, very helpful.

Apologies if I have not fully understood the details in your comment, but you are right to say that zarr partitions an array into equal-shape chunks, and then passes the data for each chunk to a compression codec (in fact this can be a pipeline of codecs configured by the user, but typically it is just a compressor). The result of encoding the chunk is then stored, and this will be of variable length depending on the data in each chunk.

In the zarr architecture, ZFP would be wrapped as a codec, which means it could be used as the compressor for an array. So zarr would pass each chunk in the array to ZFP, and then store whatever ZFP gives it back. Zarr passes all of the data for a chunk to a codec in a single API call, so in general there is no need to use streaming compression, you can just do a one-shot encoding. Typically I find that chunks should be at a minimum 1 MB uncompressed size, usually I find upwards of 16 MB is better, depending somewhat on various factors like compression ratio and type of storage being used.

A compressor (or any other type of codec) is a black box as far as zarr is concerned. If a compressor like ZFP chooses to break the chunk down further into smaller pieces, that is an internal implementation detail. E.g., the Blosc compressor does something similar, it breaks down whatever it is given into smaller blocks, so it can then use multiple threads and compress blocks in parallel.

If it is possible to vary the size of chunks that ZFP is using internally, then this is an option you'd probably want to expose to the user when they instantiate the ZFP codec, so they could tune ZFP for a particular dataset.

Hth.

lindstro commented 5 years ago

@alimanfoo, let me try to clarify.

Whereas zarr prefers uncompressed chunks as large as 16 MB, zfp in its fixed-rate mode uses compressed chunks on the order of 8-128 bytes (a cache line or two), which provides for far finer granularity of access. Think of zfp as a compressed floating-point format for SSE vectors and similar. I was just thinking out loud whether a capability like that could be exploited by zarr, for example, if traversing a 3D array a 2D or 1D slice at a time, when only a very small subset of a 16 MB chunk is needed.

My reference to zfp streaming compression was meant in the sense of sequentially (de)compressing an entire array (or large portions thereof) in contrast to zfp's inline (de)compression capability, where tiny blocks (4^d scalars in d dimensions) are (de)compressed on demand in response to random access reads or writes to individual array elements.

Of course, one could use zfp as a black box codec with zarr to (de)compress large chunks, but my question had to do with whether zarr could benefit from fine-grained access to individual scalars or a few handfuls of scalars, as provided by zfp's own compressed arrays.

If not, then the best approach to adding zfp support to zarr would most likely be to use the Python bindings we're currently developing to zfp's high-level C interface, which is designed for infrequent (de)compression of the megabyte-sized chunks that zarr prefers.

alimanfoo commented 5 years ago

Many thanks @lindstro, very nice clarification.

Of course, one could use zfp as a black box codec with zarr to (de)compress large chunks, but my question had to do with whether zarr could benefit from fine-grained access to individual scalars or a few handfuls of scalars, as provided by zfp's own compressed arrays.

We had a similar discussion a while back, as the blosc compressor offers something analogous, which is the ability to decompress specific items within a compressed buffer, via a function called blosc_getitem. The discussion is here: https://github.com/zarr-developers/zarr/issues/40, see in particular comments from here: https://github.com/zarr-developers/zarr/issues/40#issuecomment-236844228.

The bottom line is that I think it could, in principle, be possible to modify numcodecs and zarr to leverage this kind of feature. However, it would require some reworking of the API layer between zarr and numcodecs and some non-trivial implementation work. I don't have bandwidth to do work in that direction at the moment, but if someone had the time and motivation then they'd be welcome AFAIC.

As I understand it, the type of use case that would make this feature valuable are where you are doing a lot of random access operations into small regions of an array, i.e., pulling out individual values or small clusters of nearby values. I don't have any use cases like that personally, but maybe others do, and if so I'd be interested to know.

All my use cases involve running parallel computations over large regions of an array. For those use cases, it is fine to decompress entire chunks, as it is generally only a few chunks around the edge of an array or region where you don't need to decompress everything, and the overhead of doing a little extra decompression than strictly needed is negligible. Occasionally I do need to pull out single values or a small sub-region, but that is generally a one-off, so again the extra overhead of decompressing entire chunks is easy to live with.

If not, then the best approach to adding zfp support to zarr would most likely be to use the Python bindings we're currently developing to zfp's high-level C interface, which is designed for infrequent (de)compression of the megabyte-sized chunks that zarr prefers.

This sounds like a good place to start.

lindstro commented 5 years ago

Regarding use cases, you'll have to excuse my ignorance of how zarr works and the use cases it was designed for, but here are a few that motivated the design of zfp and its small blocks:

zfp serves as a substitute for conventional multidimensional random-accessible arrays. zfp generally avoids the need to rewrite existing application code to traverse the arrays in an order best suited for the underlying data structure (e.g., one chunk at a time). Ideally, you substitute only C/C++ array or STL vector declarations with zfp array declarations while leaving the remaining code intact. Sometimes the traversal order can be hidden behind iterators, but it is common in C and even C++ to use explicit indexing and nested for loops for array computations, especially in stencil-based computations that require localized random access. Because zfp's blocks are so small, it does not matter much in what order the (multidimensional) array is accessed, whereas if you use large chunks on the order of 16 MB, you'll want to process all elements in a chunk before moving on to the next one. That is, if the traversal enters and exits a chunk many times, then you don't want to (de)compress the chunk each time, and you may not be able to afford caching many large uncompressed chunks. (Does zarr cache decompressed chunks?)
If chunks are much larger than the hardware cache size, then performance will suffer when making repeated accesses to a chunk, e.g., via kernel fusion, stencil operations, gathers and scatters between staggered grids, etc. The compressed chunk size in zfp is usually on the order of one L1 cache line, while a decompressed chunk is a few cache lines, allowing both compressed and uncompressed data to fit in L1 cache if the access pattern exhibits good locality. If chunks are as large as 16 MB, then the decompressed data has already been evicted from L1 and L2 cache once computation on the decompressed data is executed.
Some applications traverse subsets of arrays. Examples include 1D and 2D slices of a 3D array, data-dependent isocontours and integral paths, boundary layers for ghost data exchange in distributed settings, regions of interest and range queries (e.g., in visualization), subsampling (e.g., for data decimation), etc. If chunks are too coarse, then there's a lot of overhead associated with decompressing data that is not accessed.
zfp supports both read and write access. Only (uncompressed and cached) blocks that are modified need to be written back to compressed storage when evicted from zfp's software cache. If chunks are large, then a change of a single value in a chunk would trigger a lot of data to be compressed.

I do agree, however, that it is often possible to structure the traversal and computation in a streaming manner to amortize the latency associated with (de)compression. This, however, typically requires refactoring existing code.

alimanfoo commented 5 years ago

Thanks @lindstro, very interesting.

FWIW Zarr was designed primarily as a storage partner to the Dask array module, which implements a subset of the Numpy interface but as chunked parallel computations. It's also designed to work well either on a single machine (either multi-threaded or multi-process) or on distributed clusters (e.g., Pangeo-like clusters reading from object storage). With Dask, the sweet spot tends to be towards larger chunks anyway, because there is some scheduling overhead associated with each task. So there has not (yet) been any pressure to optimise access to smaller array regions.

But as I mentioned above it is conceivable that Zarr could be modified to leverage capabilities of ZFP or Blosc to decompress chunk sub-regions. So if someone has a compelling use case and wants to explore this then please feel free.

rabernat commented 5 years ago

Very interesting discussion. Thanks @lindstro for taking the time to explain ZFP in such detail!

I think it is important to distinguish between what ZFP calls "chunks" and what zarr calls "chunks". ZFP chunks are clearly intended to be small. Zarr chunks tend to be much larger. An important point is that zarr is focused on serialization: zarr chunks correspond to individual files. It makes no sense to store millions of tiny files, due to the overhead of opening a file. This overhead becomes even more severe when the "files" are actually objects in cloud storage (a primary use case for zarr). So we would never want a 1:1 correspondence between ZFP chunks and zarr chunks.

Instead, what we are talking about here is using ZFP as a compression layer for zarr chunks. There will probably be a huge number of ZFP chunks in one zarr chunk (i.e. file). All that really matters here is:

The compression ratio for the whole (~10 - 100 MB) zarr chunk
The speed of compression / decompression of the whole zarr chunk

Although zarr in its current form will clearly not be able to leverage all of the cool features of ZFP like random access to individual elements, the bottom line is that ZFP provides high compression ratios for multidimensional floating point array data. This alone is justification for exposing it via numcodecs.

At this point, it would be great to move beyond speculation and see some actual implementation! 😉

kmpaul commented 5 years ago

Hey! I just wanted to announce that we (NCAR) might be able to help out with this, if @lindstro approves.

My colleague, Haiying Xu (@halehawk), already has some Python bindings for fpzip and zfp. They are currently in a private NCAR Github repository until we can iron out licensing issues, so I'll let @lindstro comment on whether he wants zfp/fpzip bindings in NCAR repositories and what license he is comfortable with. Regardless, I think that with a small bit of code refactoring (in addition to the licensing issues, if there are any), we are close to a zfp (and a second fpzip) Python package that can be used with some new codecs.

kmpaul commented 5 years ago

Oh, and I also wanted to comment that @william-silversmith mentioned that fpzip was lossless...which is true, but kind of misleading. The fpzip library can do both lossy and lossless compression, while the zfp library can only (?) do lossy compression. A user-supplied switch to the fpzip library can enable lossless compression, or lossy compression at whatever level you want.

rabernat commented 5 years ago

Regarding licensing, ZFP has already been released open source with a BSD-style license: https://github.com/LLNL/zfp/blob/master/LICENSE

Not sure exactly what you need to iron out, but the terms of this license are pretty self evident. If you're vendoring ZFP within the python library, you just need to include the original license. (This is what the fpzip python library does: https://github.com/seung-lab/fpzip/tree/master/fpzip-1.2.0) If you're not vendoring ZFP but just linking against it, you don't have to do anything special in terms of license.

kmpaul commented 5 years ago

Perfect! I'll have to check with @halehawk to find out what we need to do specifically.

lindstro commented 5 years ago

@alimanfoo, thanks for clarifying. If zarr primarily targets I/O and secondary storage, then I think it would make sense to use one of zfp's variable-rate modes to compress the large chunks that zarr prefers.

Skimming the zarr tutorial, it seems that accessing the array is typically done via slicing, i.e., where an uncompressed copy of a subset of the array is made for subsequent, repeated access. Is that fair? zfp also supports slicing but creates compressed copies. I imagine some optimizations could be made if one accesses a 2D slice of a large 3D zarr chunk compressed by zfp, since then the whole chunk would not have to be decompressed; only those zfp blocks touched by the slice require decompression.

lindstro commented 5 years ago

@rabernat, I agree that chunks serve an entirely different purpose in zfp and in zarr. As I mentioned to @alimanfoo, it might make sense to exploit the fact that each zarr chunk is many independent zfp blocks, and to decompress only those zfp blocks needed when accessing a subset of the chunk. But I imagine supporting this would likely require major surgery on zarr.

rabernat commented 5 years ago

it might make sense to exploit the fact that each zarr chunk is many independent zfp blocks, and to decompress only those zfp blocks needed when accessing a subset of the chunk.

Personally I would love to see this capability in zarr! And probably so would the netCDF developers who are currently developing a zarr backend for netCDF. It would be particularly valuable for the common use case where the zarr array is chunked in the time dimension and the user wants to compute a timeseries at one single point in space. Today this basically requires reading / decompressing the entire array. Highly inefficient.

Once we get some basic zarr / ZFP integration going, it could set the stage for such future developments. Right now, my goal is to reduce our cloud storage bill as much as possible without losing scientific value from the data.

lindstro commented 5 years ago

@kmpaul, I forgot that Haiying has already done some work on wrapping zfp, and I appreciate your offer to help out with this. I will say, however, that we have a milestone due in the next few weeks to provide Python wrappers in zfp that will become part of the next official release slated for March 2019. I have a developer currently working on this. We'll likely make these wrappers available on a separate branch before the release if people want to play with them.

As far as integration with zarr, I think either approach would be fine. Using the official zfPy bindings has the benefit of long-term support and addition of new zfp features as they become available, while Haiying's wrappers have already been written and could likely be used to experiment with zfp compression within zarr today.

kmpaul commented 5 years ago

@lindstro Yeah, that's the problem with private repos. :smile: I'm fine going down both paths in parallel until zfpy is available. It should aid with more immediate testing.

rabernat commented 5 years ago

If anyone has any existing ZFP python code they want to try now, you can just follow the example I made with fpzip: https://gist.github.com/rabernat/5199d80af701b442f4bec6092c8e4a70

Creating the basic codec is pretty trivial:

import fpzip
from numcodecs.abc import Codec
from numcodecs import register_codec, Zstd

class FPZip(Codec):

    codec_id = 'fpzip'

    def __init__(self, precision=0):
        self.precision = precision

    def encode(self, buf):
        return fpzip.compress(buf, precision=self.precision)

    def decode(self, buf):
        return fpzip.decompress(buf)

    def __repr__(self):
        r = '%s(precision=%r)' % \
            (type(self).__name__,
             self.precision)
        return r

jakirkham commented 5 years ago

@lindstro, one important point is Zarr supports on "disk" storage (though this could be an object store or actually files in directories) or in-memory storage. It's possible smaller chunk size could make sense for some in-memory Zarr Array use cases. The in-memory format is a dictionary with each chunk in different key-value pairs. Fortunately whatever work goes into adding a codec for Numcodecs could be leveraged easily by either use case.

alimanfoo commented 5 years ago

@rabernat, I agree that chunks serve an entirely different purpose in zfp and in zarr. As I mentioned to @alimanfoo, it might make sense to exploit the fact that each zarr chunk is many independent zfp blocks, and to decompress only those zfp blocks needed when accessing a subset of the chunk. But I imagine supporting this would likely require major surgery on zarr.

Let's just say it would be an interesting challenge :smile:. If anyone does decide they want to explore this at some point, I'd recommend having a specific, concrete use case in mind to target and benchmark, including specific computations, specific data, and a specific target platform, including what type of storage, what type of computational nodes, and what (if any) type of parallelism. I'd also recommend making sure it's an important use case, i.e., optimisation would save somebody a significant amount of time and/or money; and doing enough thinking ahead of time to be fairly convinced that the desired performance improvement should in theory be achievable. These suggestions are obvious to everyone I'm sure, but just wouldn't want anyone to sink loads of time for little reward.

halehawk commented 5 years ago

I almost done with zfp codec. Just want to know zfp has four different mode, such as accuracy, precision, rate and then combination. For combination mode, it can accept four parameters. Do I just pass them in the API as compress(buf, mode, accuracy, precision,rate, minbits, maxbits, maxprec, minexp)?

Thanks,

Haiying

On Tue, Dec 4, 2018 at 3:16 PM Alistair Miles notifications@github.com wrote:

@rabernat https://github.com/rabernat, I agree that chunks serve an entirely different purpose in zfp and in zarr. As I mentioned to @alimanfoo https://github.com/alimanfoo, it might make sense to exploit the fact that each zarr chunk is many independent zfp blocks, and to decompress only those zfp blocks needed when accessing a subset of the chunk. But I imagine supporting this would likely require major surgery on zarr.

Let's just say it would be an interesting challenge 😄. If anyone does decide they want to explore this at some point, I'd recommend having a specific, concrete use case in mind to target and benchmark, including specific computations, specific data, and a specific target platform, including what type of storage, what type of computational nodes, and what (if any) type of parallelism. I'd also recommend making sure it's an important use case, i.e., optimisation would save somebody a significant amount of time and/or money; and doing enough thinking ahead of time to be fairly convinced that the desired performance improvement should in theory be achievable. These suggestions are obvious to everyone I'm sure, but just wouldn't want anyone to sink loads of time for little reward.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/numcodecs/issues/117#issuecomment-444280045, or mute the thread https://github.com/notifications/unsubscribe-auth/AIDyFJdhknshrmw0AbPfMrd4I6pZHQtSks5u1vQhgaJpZM4YTK1M .

jakirkham commented 5 years ago

Normally those sorts of things go in the constructor and are set as members of the instance. Take a look at Zlib for a simple example.

rabernat commented 5 years ago

@halehawk - a great way to move forward with this would be to just submit a pull request with your code as is. Then we can discuss the details and iterate before merging.

halehawk commented 5 years ago

I did a pull request yesterday.

Sent from my iPhone

On Dec 22, 2018, at 8:34 AM, Ryan Abernathey notifications@github.com wrote:

@halehawk - a great way to move forward with this would be to just submit a pull request with your code as is. Then we can discuss the details and iterate before merging.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

rabernat commented 5 years ago

Ah fantastic!

Perhaps you could edit your PR description to include the text "Closes #117" in the description? That way it will be linked to this discussion.

halehawk commented 5 years ago

Do you mean I shall submit a new pull request? By the way what is ‘tox’ in the to-do list

Sent from my iPhone

On Dec 22, 2018, at 8:34 AM, Ryan Abernathey notifications@github.com wrote:

@halehawk - a great way to move forward with this would be to just submit a pull request with your code as is. Then we can discuss the details and iterate before merging.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

halehawk commented 5 years ago

I just added closes #117 in the description.

Sent from my iPhone

On Dec 22, 2018, at 9:08 AM, Ryan Abernathey notifications@github.com wrote:

Ah fantastic!

Perhaps you could edit your PR description to include the text "Closes #117" in the description? That way it will be linked to this discussion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

lindstro commented 5 years ago

zfp 0.5.5 has been released, which adds Python bindings to the C API and the ability to (de)compress NumPy arrays. The current release supports compression of arbitrarily strided arrays (including non-contiguous arrays), but does not yet support decompression with arbitrary strides. This release also adds lossless compression to zfp.

We are planning on expanding zfp's Python API over the next year or so, for instance, to include Python bindings to zfp's compressed arrays.

kmpaul commented 4 years ago

Update: zfpy version 0.5.5 has now been released on conda-forge for Linux and OSX.

lindstro commented 4 years ago

Another update: zfpy version 0.5.5 is now available as a pip package for Linux, macOS, and Windows.

zarr-developers / numcodecs

ZFP Compression #117