Closed robertu94 closed 3 years ago
We talked about regular expressions as a way to get the data out of the subprocess.run output, and some ways to organize the spreadsheet.
We also talked about running 30 replicates for short running tasks.
In the long run, statistical methods are better, http://pages.stat.wisc.edu/~st571-1/10-power-2.pdf talks about them. the formula is on page 12.
The goal of this task is to estimate the runtime and memory overhead of libpressio for several different datasets and configurations. The output would be most useful as a CSV file.
Some datasets to consider:
Some configurations to consider:
For each pick a few different levels near .01, 1, 2, 5, and 30% of the value range of the dataset
Some timing/measurement tools:
/usr/bin/time
getrusage
-- used by the memory metric in libpressioposix_gettime
used by the time metric in libpressioplease ensure that
robertu94_packages
spack repo is up to date (git pull). It has a new version of libpressio and a new version of zfp that enable the utitlies to be built as shown below.sz -- you can install this via
spack install sz
sz -z -i CLOUDf48 -M ABS -A 1e-6 -f -3 500 500 100
this does compressionsz -x -f -M ABS -A 1e-6 -3 500 500 100 -s CLOUDf48.sz
this does decompressionzfp -- you can install this via
spack install zfp@develop+utilities
zfp -i CLOUDf48 -z CLOUDf48.zfp -3 100 500 500 -f -a 1e-6
compress data `zfp -z CLOUDf48.zfp - o CLOUDf48.dec -3 100 500 500 -f -a 1e-6" decompress datanote that the dimension information flips when using ZFP. Use the ordering I used for SZ when using libpressio.
You might also find
spack install libpressio-tools ^libpressio+sz+zfp
useful.pressio -i CLOUDf48 -m time -M all -d 500 -d 500 -d 100 -t float sz -o sz:abs_err_bound=1e-6 -o sz:error_bound_mode_str=abs
pressio -i CLOUDf48 -m time -M all -d 500 -d 500 -d 100 -t float zfp -o zfp:accuracy=1e-6