szcompressor / SZ

Error-bounded Lossy Data Compressor (for floating-point/integer datasets)
http://szcompressor.org
Other
155 stars 56 forks source link

SZ2: Error-bounded Lossy Compressor for HPC Data

(C) 2016-2022 by Mathematics and Computer Science (MCS), Argonne National Laboratory. See COPYRIGHT in top-level directory.

Citations

Kindly note: This site contains the implementation of SZ2.x. If you mention SZ in your paper, the most appropriate citation is including these three references (**ICDE2021, HPDC2020 and BigData2018), because they cover the whole design and implementation of the latest version of SZ**.

Note: SZ3 has been released here. SZ3 has much higher compression ratios than SZ2 in many cases, with comparable throughput (suffering slightly degraded throughput though). Details can be found in our ICDE21 paper.

This document simply introduces how to install and use the SZ compressor. More details can be found in doc/userguide.pdf.

OpenCL version can be found in the package, while this is a deprecated code for GPU. The optimized GPU code in CUDA can be found at https://github.com/szcompressor/cuSZ.

Installation

Installation way 1:

Installation way 2:

Then, you'll find all the executables in [INSTALL_DIR]/bin and .a and .so libraries in [INSTALL_DIR]/lib

Testing Examples


Examples can be found in the [SZ_PACKAGE]/example

You can use the executable 'sz' command to do the compression/decompression. Please see the user guide or run 'sz --help' for details.

Alternatively, you can also also call our API to do the compression/decompressoin. Here are two examples: testfloat_compress.c and testfloat_decompress.c

Compression


Decription:

testdouble_8_8_128.dat and testdouble_8_8_8_128.dat are two binary testing files (small-endian format), which contains a 3d array (128X8X8) and a 4d array (128X8X8X8) respectively. Their data values are shown in the two plain text files, testdouble_8_8_128.txt and testdouble_8_8_8_128.txt. These two data files are from FLASH_Blast2 and FLASH_MacLaurin respectively (both are at time step 100). The compressed data files are namely testdouble_8_8_8_128.dat.sz and testdouble_8_8_128.dat.sz respectively.

sz.config is the configuration file. The key settings are errorBoundMode, absErrBound, and relBoundRatio, which are described below.

Decompression

The output files are testdouble_8_8_8_128.dat.sz.out and testdouble_8_8_128.dat.sz.out respectively. You can compare .txt file and .out file for checking the compression errors for each data point. For instance, compare testdouble_8_8_8_128.txt and testdouble_8_8_8_128.dat.sz.out.

Application Programming Interface (API)

Programming interfaces are procides in two programming languages - C:

The interfaces are listed below. More details can be found in the user guide.

C Interface

Fortran Interface

Please see doc/use-guide for details

Python Interface

NOTE: THESE BINDINGS ARE DEPRECATED

The following information is provided for historical purposes only. Please consider updating to using the Python bindings for SZ provided with LibPressio instead which are more efficient and updated with new features in SZ as they are developed.

The python bindings requires some additional dependencies:

To use the python interface, autotools is not supported. You should compile with CMake instead as follows:

# if you are working in a python virtual envionment, source it here
# otherwise cmake will detect the wrong numpy version which can cause
# segmentation faults and other bizzare errors
source bin/actiate

mkdir build
cd build
#you can specify other cmake arguments such as CMAKE_INSTALL_PREFIX here
cmake .. -DBUILD_PYTHON_WRAPPER=ON
cmake --build .

#for system wide installation sudo is required
sudo cmake --install .

Please note, building static libraries is incompatiable with building the python wrappers, and is building the python wrappers is disabled if -DBUILD_SHARED_LIBS=OFF

An example usage file can be found in example/test.py

Additional documentation can be found using the help function in python.

Limitation of this version

SZ is not suitable for compressing tiny datasets (such as the size <10KB)

GPU Version

Please refer to this repository for our GPU/CUDA version of SZ (called cuSZ). Please create an issue ticket there if you have any questions or issues regarding cuSZ.

Performance Portable Version

Please refer to this repository for our performance portable version of SZ (called kSZ) using Kokkos programming model. Please create an issue ticket there if you have any questions or issues regarding kSZ.

Version history

version New features

The version 0.x were all coded in Java, and C/Fortran interfaces were provided by using JNI and C/Fortran wrapper. SZ 1.0 is coded in C purely.