A framework for predicting compression ratios for lossy compressors and other key metrics in a generic way using LibPressio
You can use the tool pressio_predict_bench
to evaluate your prediction method on a dataset of your choice. Here is an example of running the tool the 13 fields of the Hurricane dataset from SDRBench with the method average_sampled
:
./build/pressio_predict_bench \
-a dataset \
-a settings \
-a run \
-a score \
-L pressio:loader=folder \
-l folder:base_dir=$HOME/git/datasets/hurricane/100x500x500/ \
-l io_loader:dims=500 \
-l io_loader:dims=500 \
-l io_loader:dims=100 \
-l io_loader:dtype=float \
-l io_loader:use_template=true \
-l folder:regex='.+/([A-Z]+)f(\d+).bin.f32' \
-l folder:groups=field \
-l folder:groups=timestep \
-b pressio:compressor=sz3 \
-b pressio:metric=composite \
-b composite:plugins=size \
-b composite:plugins=time \
-b composite:plugins=error_stat \
-o pressio:abs=1e-5
It will compute the median absolute percentage error. If built with MPI support, this command will run in parallel on a cluster. If your metrics are subject to interference, run in isolation mode with the -Z isolate
flag.
You can use the prediction schemes from C/C++ using the following API pattern
TODO EXAMPLE
Build the code, then run the provided tool pressio_new metric $name > ./src/plugins/predictors/$name.cc
from LibPressioTools to add a new
predictor from a metrics template.
After the code is generated, edit the resulting file in
src/plugins/predictors/$name.cc
to add your prediction method. You likely
just need to edit the begin_compress_impl
function to gather the metrics
you need. You can see the example in src/plugins/predictors/tiled_samples.cc
for an example that invokes the compressor to produces an estimate.
Next in get_configuration
add an entry predictors:invalidate
with a std::vector<std::string>
containing the list of options to a compressor that would invalidate this predictor calculation.
predictors:nondeterministic
indicates that this metric is
is non-deterministic (e.g. Randomized SVD) even if the underlying compressor is.predictors:runtime
indicates that this metric is
is dependent on performance related characteristics (e.g. time:compress)predictors:error_dependent
indicates that this metric is
invalidated whenever whenever any setting that effects the quality of the
data after compression is changed (e.g. the quantized entropy)predictors:error_agnostic
indicates that this metric is
independent of the configuration of any compressor (e.g. the standard deviation
of the input data)pressio:abs
or sz:quantization_intervals
)
is provided in this list, the compressor for which predictions are being
made MUST implement this metric for this predictor to use this predictor.
If the compressor does not implement the metric MAY return an
appropriate error.After you have the predictors used to produce the estimates, run pressio_new estimator
to generate a estimator class from a template that combines these estimates
into an estimation of the metric of interest.
TODO EXAMPLE
Edit the CMakeLists.txt code to add dependencies for your new predictor/estimator.
Build your new predictor and/or estimator modules, and ensure that the automated unit tests pass.
You can test a metric as follows (in this case tao2019
with sz3
):
pressio -i ~/git/datasets/hurricane/100x500x500/CLOUDf48.bin.f32 -d 500 -d 500 -d 100 -t float pressio -m time -b time:metric=tao2019 -M all -D path/to/liblibpressio_predict.so -b pressio:compressor=sz3
libpressio-predict
is best installed via spack.
git clone https://github.com/spack/spack
git clone https://github.com/robertu94/spack_packages robertu94_packages
source ./spack/share/spack/setup-env.sh
spack repo add ./robertu94_packages
spack install libpressio-predict+bin
The easiest way to do a development build of libpressio is to use Spack envionments.
# one time setup: create an envionment
spack env create -d mydevenviroment
spack env activate mydevenvionment
# one time setup: tell spack to set LD_LIBRARY_PATH with the spack envionment's library paths
spack config add modules:prefix_inspections:lib64:[LD_LIBRARY_PATH]
spack config add modules:prefix_inspections:lib:[LD_LIBRARY_PATH]
# one time setup: install libpressio-tools and checkout
# libpressio for development
spack add libpressio-predict+bin
spack develop libpressio-predict@git.master
# compile and install (repeat as needed)
spack install
Libpressio-Predict unconditionally requires:
cmake
pkg-config
libpressio
Dependency versions and optional dependencies are documented in the spack package.
Please refer to docs/stability.md.
Please refer to CONTRIBUTORS.md for a list of contributors, sponsors, and contribution guidelines.
Please files bugs to the Github Issues page on the robertu94 libpressio-predict repository.
Please read this post on how to file a good bug report. After reading this post, please provide the following information specific to libpressio-predict:
/etc/os-release
cmake -L $BUILD_DIR
We hope to publish a paper on LibPressioPredict soon. Until then please cite LibPressio and the underlying prediction method which should have a citation in the pressio:description
@inproceedings{underwood2021productive,
title={Productive and Performant Generic Lossy Data Compression with LibPressio},
author={Underwood, Robert and Malvoso, Victoriana and Calhoun, Jon C and Di, Sheng and Cappello, Franck},
booktitle={2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7)},
pages={1--10},
year={2021},
organization={IEEE}
}
We've achieved reproducabilty materials for our prior efforts in the reproduceability folder by the first author name and year.