Closed JasonRuonanWang closed 2 years ago
Are you using the latest master branch version? I think the most important change is that the latest version supports thread-safe, but I don't think that would impact the correctness of the compression results. Could you send us one or a few test-cases (say, some example datasets leading to the failures). What kinds of tests failed? Thanks.
Best, Sheng
On Tue, Nov 2, 2021 at 10:14 PM Jason Wang @.***> wrote:
Just wondering if there has been any API or fundamental algorithm changes in v 2.1.12? We have a bunch of tests written in ADIOS for SZ compression, which have been working for quite long time since a number of versions back, but 2.1.12 broke almost all of our tests. I haven't had a chance to look into the details yet, but it seems to me that there is something in 2.1.12 that is very different than previous versions. Any hints would be very much appreciated. Thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/szcompressor/SZ/issues/78, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK3KSI4Y3QCGPWDGEJ3MSTUKCZKBANCNFSM5HH5HGAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Are you using the latest master branch version? I think the most important change is that the latest version supports thread-safe, but I don't think that would impact the correctness of the compression results. Could you send us one or a few test-cases (say, some example datasets leading to the failures). What kinds of tests failed? Thanks. Best, Sheng …
I have the following test code extracted from one of our ADIOS2 tests. With any versions prior to 2.1.12, it passes with the following output, which is correct.
0 : 4.88013e-07, 0.001 : 0.00100049, 0.002 : 0.00200049, 0.003 : 0.00300049, 0.004 : 0.00400049, 0.005 : 0.00500049, 0.006 : 0.00600049, 0.007 : 0.00700049, 0.008 : 0.00800049, 0.009 : 0.00900049,
0.01 : 0.0100005, 0.011 : 0.0110005, 0.012 : 0.0120005, 0.013 : 0.0130005, 0.014 : 0.0140005, 0.015 : 0.0150005, 0.016 : 0.0160005, 0.017 : 0.0170005, 0.018 : 0.0180005, 0.019 : 0.0190005,
0.02 : 0.0200005, 0.021 : 0.0210005, 0.022 : 0.0220005, 0.023 : 0.0230005, 0.024 : 0.0240005, 0.025 : 0.0250005, 0.026 : 0.0260005, 0.027 : 0.0270005, 0.028 : 0.0280005, 0.029 : 0.0290005,
0.03 : 0.0300005, 0.031 : 0.0310005, 0.032 : 0.0320005, 0.033 : 0.0330005, 0.034 : 0.0340005, 0.035 : 0.0350005, 0.036 : 0.0360005, 0.037 : 0.0370005, 0.038 : 0.0380005, 0.039 : 0.0390005,
0.04 : 0.0400005, 0.041 : 0.0410005, 0.042 : 0.0420005, 0.043 : 0.0430005, 0.044 : 0.0440005, 0.045 : 0.0450005, 0.046 : 0.0460005, 0.047 : 0.0470005, 0.048 : 0.0480005, 0.049 : 0.0490005,
0.05 : 0.0500005, 0.051 : 0.0510005, 0.052 : 0.0520005, 0.053 : 0.0530005, 0.054 : 0.0540005, 0.055 : 0.0550005, 0.056 : 0.0560005, 0.057 : 0.0570005, 0.058 : 0.0580005, 0.059 : 0.0590005,
0.06 : 0.0600005, 0.061 : 0.0610005, 0.062 : 0.0620005, 0.063 : 0.0630005, 0.064 : 0.0640005, 0.065 : 0.0650005, 0.066 : 0.0660005, 0.067 : 0.0670005, 0.068 : 0.0680005, 0.069 : 0.0690005,
0.07 : 0.0700005, 0.071 : 0.0710005, 0.072 : 0.0720005, 0.073 : 0.0730005, 0.074 : 0.0740005, 0.075 : 0.0750005, 0.076 : 0.0760005, 0.077 : 0.0770005, 0.078 : 0.0780005, 0.079 : 0.0790005,
0.08 : 0.0800005, 0.081 : 0.0810005, 0.082 : 0.0820005, 0.083 : 0.0830005, 0.084 : 0.0840005, 0.085 : 0.0850005, 0.086 : 0.0860005, 0.087 : 0.0870005, 0.088 : 0.0880005, 0.089 : 0.0890005,
0.09 : 0.0900005, 0.091 : 0.0910005, 0.092 : 0.0920005, 0.093 : 0.0930005, 0.094 : 0.0940005, 0.095 : 0.0950005, 0.096 : 0.0960005, 0.097 : 0.0970005, 0.098 : 0.0980005, 0.099 : 0.0990005,
But with any versions newer than 2.1.12 inclusive, including the latest master branch, it produces wrong data as follows
0 : 4.88013e-07, 0.001 : -6.5526, 0.002 : -19.6588, 0.003 : -39.3186, 0.004 : -65.532, 0.005 : -98.299, 0.006 : -137.62, 0.007 : -183.494, 0.008 : -235.922, 0.009 : -294.903,
0.01 : -6.5436, 0.011 : -19.6498, 0.012 : -39.3096, 0.013 : -65.523, 0.014 : -98.29, 0.015 : -137.611, 0.016 : -183.485, 0.017 : -235.913, 0.018 : -294.894, 0.019 : -360.429,
0.02 : -13.0872, 0.021 : -32.747, 0.022 : -58.9604, 0.023 : -91.7274, 0.024 : -131.048, 0.025 : -176.922, 0.026 : -229.35, 0.027 : -288.331, 0.028 : -353.866, 0.029 : -425.955,
0.03 : -19.6308, 0.031 : -45.8442, 0.032 : -78.6112, 0.033 : -117.932, 0.034 : -163.806, 0.035 : -216.234, 0.036 : -275.215, 0.037 : -340.75, 0.038 : -412.839, 0.039 : -491.481,
0.04 : -26.1744, 0.041 : -58.9414, 0.042 : -98.262, 0.043 : -144.136, 0.044 : -196.564, 0.045 : -255.545, 0.046 : -321.08, 0.047 : -393.169, 0.048 : -471.811, 0.049 : -557.007,
0.05 : -32.718, 0.051 : -72.0386, 0.052 : -117.913, 0.053 : -170.341, 0.054 : -229.322, 0.055 : -294.857, 0.056 : -366.946, 0.057 : -445.588, 0.058 : -530.783, 0.059 : -622.533,
0.06 : -39.2616, 0.061 : -85.1358, 0.062 : -137.564, 0.063 : -196.545, 0.064 : -262.08, 0.065 : -334.169, 0.066 : -412.811, 0.067 : -498.006, 0.068 : -589.756, 0.069 : -688.059,
0.07 : -45.8052, 0.071 : -98.233, 0.072 : -157.214, 0.073 : -222.749, 0.074 : -294.838, 0.075 : -373.48, 0.076 : -458.676, 0.077 : -550.425, 0.078 : -648.728, 0.079 : -753.585,
0.08 : -52.3488, 0.081 : -111.33, 0.082 : -176.865, 0.083 : -248.954, 0.084 : -327.596, 0.085 : -412.792, 0.086 : -504.541, 0.087 : -602.844, 0.088 : -707.701, 0.089 : -819.111,
0.09 : -58.8924, 0.091 : -124.427, 0.092 : -196.516, 0.093 : -275.158, 0.094 : -360.354, 0.095 : -452.103, 0.096 : -550.406, 0.097 : -655.263, 0.098 : -766.673, 0.099 : -884.637,
#include <vector>
#include <cstring>
#include <functional>
#include <iostream>
#include <numeric>
#include <algorithm>
#include <map>
#include <string>
#include <math.h>
extern "C" {
#include <sz.h>
}
using Dims=std::vector<size_t>;
using Params=std::map<std::string,std::string>;
size_t GetTotalSize(const Dims &dimensions, const size_t elementSize)
{
return std::accumulate(dimensions.begin(), dimensions.end(), elementSize, std::multiplies<size_t>());
}
Dims ConvertDims(const Dims &dimensions, const size_t targetDims, const bool enforceDims = false, const size_t defaultDimSize = 1)
{
if (targetDims < 1)
{
throw(std::invalid_argument(
"Operator::ConvertDims only accepts targetDims>0"));
}
Dims ret = dimensions;
while (true)
{
auto it = std::find(ret.begin(), ret.end(), 1);
if (it == ret.end())
{
break;
}
else
{
ret.erase(it);
}
}
while (ret.size() > targetDims)
{
ret[1] *= ret[0];
ret.erase(ret.begin());
}
while (enforceDims && ret.size() < targetDims)
{
ret.insert(ret.begin(), defaultDimSize);
}
return ret;
}
size_t Compress(const char *dataIn, const Dims &blockCount, char *bufferOut, const Params ¶meters)
{
const uint8_t bufferVersion = 1;
size_t bufferOutOffset = 0;
const size_t ndims = blockCount.size();
Dims convertedDims = ConvertDims(blockCount, 4);
sz_params sz;
memset(&sz, 0, sizeof(sz_params));
sz.max_quant_intervals = 65536;
sz.quantization_intervals = 0;
// sz.dataEndianType = LITTLE_ENDIAN_DATA;
// sz.sysEndianType = LITTLE_ENDIAN_DATA;
sz.sol_ID = SZ;
// sz.layers = 1;
sz.sampleDistance = 100;
sz.predThreshold = 0.99;
// sz.offset = 0;
sz.szMode = SZ_BEST_COMPRESSION; // SZ_BEST_SPEED; //SZ_BEST_COMPRESSION;
sz.gzipMode = 1;
sz.errorBoundMode = ABS;
sz.absErrBound = 1E-4;
sz.relBoundRatio = 1E-3;
sz.psnr = 80.0;
sz.pw_relBoundRatio = 1E-5;
sz.segment_size = static_cast<int>(std::pow(5., static_cast<double>(ndims)));
sz.pwr_type = SZ_PWR_MIN_TYPE;
convertedDims = ConvertDims(blockCount, 4, true, 1);
/* SZ parameters */
int use_configfile = 0;
std::string sz_configfile = "sz.config";
Params::const_iterator it;
for (it = parameters.begin(); it != parameters.end(); it++)
{
if (it->first == "init")
{
use_configfile = 1;
sz_configfile = std::string(it->second);
}
else if (it->first == "max_quant_intervals")
{
sz.max_quant_intervals = std::stoi(it->second);
}
else if (it->first == "quantization_intervals")
{
sz.quantization_intervals = std::stoi(it->second);
}
else if (it->first == "sol_ID")
{
sz.sol_ID = std::stoi(it->second);
}
else if (it->first == "sampleDistance")
{
sz.sampleDistance = std::stoi(it->second);
}
else if (it->first == "predThreshold")
{
sz.predThreshold = std::stof(it->second);
}
else if (it->first == "szMode")
{
int szMode = SZ_BEST_SPEED;
if (it->second == "SZ_BEST_SPEED")
{
szMode = SZ_BEST_SPEED;
}
else if (it->second == "SZ_BEST_COMPRESSION")
{
szMode = SZ_BEST_COMPRESSION;
}
else if (it->second == "SZ_DEFAULT_COMPRESSION")
{
szMode = SZ_DEFAULT_COMPRESSION;
}
else
{
throw std::invalid_argument(
"ERROR: ADIOS2 operator unknown SZ parameter szMode: " +
it->second + "\n");
}
sz.szMode = szMode;
}
else if (it->first == "gzipMode")
{
sz.gzipMode = std::stoi(it->second);
}
else if (it->first == "errorBoundMode")
{
int errorBoundMode = ABS;
if (it->second == "ABS")
{
errorBoundMode = ABS;
}
else if (it->second == "REL")
{
errorBoundMode = REL;
}
else if (it->second == "ABS_AND_REL")
{
errorBoundMode = ABS_AND_REL;
}
else if (it->second == "ABS_OR_REL")
{
errorBoundMode = ABS_OR_REL;
}
else if (it->second == "PW_REL")
{
errorBoundMode = PW_REL;
}
else
{
throw std::invalid_argument("ERROR: ADIOS2 operator "
"unknown SZ parameter "
"errorBoundMode: " +
it->second + "\n");
}
sz.errorBoundMode = errorBoundMode;
}
else if (it->first == "absErrBound")
{
sz.absErrBound = std::stof(it->second);
}
else if (it->first == "relBoundRatio")
{
sz.relBoundRatio = std::stof(it->second);
}
else if (it->first == "pw_relBoundRatio")
{
sz.pw_relBoundRatio = std::stof(it->second);
}
else if (it->first == "segment_size")
{
sz.segment_size = std::stoi(it->second);
}
else if (it->first == "pwr_type")
{
int pwr_type = SZ_PWR_MIN_TYPE;
if ((it->first == "MIN") || (it->first == "SZ_PWR_MIN_TYPE"))
{
pwr_type = SZ_PWR_MIN_TYPE;
}
else if ((it->first == "AVG") || (it->first == "SZ_PWR_AVG_TYPE"))
{
pwr_type = SZ_PWR_AVG_TYPE;
}
else if ((it->first == "MAX") || (it->first == "SZ_PWR_MAX_TYPE"))
{
pwr_type = SZ_PWR_MAX_TYPE;
}
else
{
throw std::invalid_argument("ERROR: ADIOS2 operator "
"unknown SZ parameter "
"pwr_type: " +
it->second + "\n");
}
sz.pwr_type = pwr_type;
}
else if ((it->first == "abs") || (it->first == "absolute") ||
(it->first == "accuracy"))
{
sz.errorBoundMode = ABS;
sz.absErrBound = std::stod(it->second);
}
else if ((it->first == "rel") || (it->first == "relative"))
{
sz.errorBoundMode = REL;
sz.relBoundRatio = std::stof(it->second);
}
else if ((it->first == "pw") || (it->first == "pwr") ||
(it->first == "pwrel") || (it->first == "pwrelative"))
{
sz.errorBoundMode = PW_REL;
sz.pw_relBoundRatio = std::stof(it->second);
}
}
if (use_configfile)
{
SZ_Init(sz_configfile.c_str());
}
else
{
SZ_Init_Params(&sz);
}
// Get type info
int dtype = SZ_FLOAT;
size_t szBufferSize;
auto *szBuffer = SZ_compress( dtype, const_cast<char *>(dataIn), &szBufferSize, 0, convertedDims[0], convertedDims[1], convertedDims[2], convertedDims[3]);
std::memcpy(bufferOut + bufferOutOffset, szBuffer, szBufferSize);
bufferOutOffset += szBufferSize;
free(szBuffer);
szBuffer = nullptr;
SZ_Finalize();
return bufferOutOffset;
}
size_t Decompress(const char *bufferIn, const size_t sizeIn, char *dataOut, const Dims &blockCount)
{
size_t bufferInOffset = 0;
Dims convertedDims = ConvertDims(blockCount, 4, true, 1);
int dtype = SZ_FLOAT;
size_t dataTypeSize=4;
const size_t dataSizeBytes = GetTotalSize(convertedDims, dataTypeSize);
void *result = SZ_decompress(dtype, reinterpret_cast<unsigned char *>( const_cast<char *>(bufferIn + bufferInOffset)), sizeIn - bufferInOffset, 0, convertedDims[0], convertedDims[1], convertedDims[2], convertedDims[3]);
if (result == nullptr)
{
throw std::runtime_error("ERROR: SZ_decompress failed\n");
}
std::memcpy(dataOut, result, dataSizeBytes);
free(result);
result = nullptr;
return dataSizeBytes;
}
int main()
{
std::vector<float> dataOriginal(100);
std::vector<float> dataCompressed(100);
std::vector<float> dataDecompressed(100);
for(int i=0;i<100;++i)
{
dataOriginal[i]=(float)i*0.001;
}
size_t size = Compress(reinterpret_cast<char*>(dataOriginal.data()), {10,10}, reinterpret_cast<char*>(dataCompressed.data()), {{"Accuracy","0.1"}});
Decompress(reinterpret_cast<char*>(dataCompressed.data()), size, reinterpret_cast<char*>(dataDecompressed.data()), {10,10});
for(int i=0;i<10;++i)
{
for(int j=0;j<10;++j)
{
std::cout << dataOriginal[i*10+j] << " : " << dataDecompressed[i*10+j] << ", ";
}
std::cout << std::endl;
}
return 0;
}
Hi Jason, I can reproduce the problem. The reason is that you input the wrong information style for the dimensions. The dataset in your example is a 2D array (10x10), so when you call SZ_compress(), you should set the arguments as follows: dtype, cdata, &szBufferSize, 0, 0, 0, 10, 10
However, the arguments in your code is dtype, cdata, &szBufferSize, 0, 1, 1, 10, 10
Please note that SZ determines the number of dimensions based on the number of leading zeros. That is, {0, 0, 0, 10, 10} means a 2D dataset and thus SZ will call the correct 2D compression function. However, {0, 1, 1, 10, 10} indicates a 4D dataset, and in this situation SZ will call 4D compression instead of 2D compression. If you change the dimension style, I confirm that the result will be correct.
I also tested sz2.1.11, and you are right, the decompressed data are still correct even with 0, 1, 1, 10, 10, but the compression ratio is slightly worse. The likely reason is that SZ2.1.11 will call 4D compression function instead of 2D function, and the 4D function might already have considered the situation with dimensions like (0, 1, 1, m, n). Maybe this part was changed in SZ2.1.12 (I didn't double-check it though). Anyway, uch dimension setting (0, 1, 1, 10, 10) is not a correct style for SZ. Setting dimensions in a correct style (using 0 instead of 1 to represent missing higher dimensions) is important. Please feel free to let me know if you still have a problem after using the correct dimensions. Please note that you need to revise the dimension style for both SZ_compress() and SZ_decompress(), to make sure getting correct decompressed results.
Best, Sheng
On Wed, Nov 3, 2021 at 5:23 PM Jason Wang @.***> wrote:
Are you using the latest master branch version? I think the most important change is that the latest version supports thread-safe, but I don't think that would impact the correctness of the compression results. Could you send us one or a few test-cases (say, some example datasets leading to the failures). What kinds of tests failed? Thanks. Best, Sheng … <#m3051150508264016176>
I have the following test code extracted from one of our ADIOS2 tests. With any versions prior to 2.1.12, it passes with the following output, which is correct.
0 : 4.88013e-07, 0.001 : 0.00100049, 0.002 : 0.00200049, 0.003 : 0.00300049, 0.004 : 0.00400049, 0.005 : 0.00500049, 0.006 : 0.00600049, 0.007 : 0.00700049, 0.008 : 0.00800049, 0.009 : 0.00900049, 0.01 : 0.0100005, 0.011 : 0.0110005, 0.012 : 0.0120005, 0.013 : 0.0130005, 0.014 : 0.0140005, 0.015 : 0.0150005, 0.016 : 0.0160005, 0.017 : 0.0170005, 0.018 : 0.0180005, 0.019 : 0.0190005, 0.02 : 0.0200005, 0.021 : 0.0210005, 0.022 : 0.0220005, 0.023 : 0.0230005, 0.024 : 0.0240005, 0.025 : 0.0250005, 0.026 : 0.0260005, 0.027 : 0.0270005, 0.028 : 0.0280005, 0.029 : 0.0290005, 0.03 : 0.0300005, 0.031 : 0.0310005, 0.032 : 0.0320005, 0.033 : 0.0330005, 0.034 : 0.0340005, 0.035 : 0.0350005, 0.036 : 0.0360005, 0.037 : 0.0370005, 0.038 : 0.0380005, 0.039 : 0.0390005, 0.04 : 0.0400005, 0.041 : 0.0410005, 0.042 : 0.0420005, 0.043 : 0.0430005, 0.044 : 0.0440005, 0.045 : 0.0450005, 0.046 : 0.0460005, 0.047 : 0.0470005, 0.048 : 0.0480005, 0.049 : 0.0490005, 0.05 : 0.0500005, 0.051 : 0.0510005, 0.052 : 0.0520005, 0.053 : 0.0530005, 0.054 : 0.0540005, 0.055 : 0.0550005, 0.056 : 0.0560005, 0.057 : 0.0570005, 0.058 : 0.0580005, 0.059 : 0.0590005, 0.06 : 0.0600005, 0.061 : 0.0610005, 0.062 : 0.0620005, 0.063 : 0.0630005, 0.064 : 0.0640005, 0.065 : 0.0650005, 0.066 : 0.0660005, 0.067 : 0.0670005, 0.068 : 0.0680005, 0.069 : 0.0690005, 0.07 : 0.0700005, 0.071 : 0.0710005, 0.072 : 0.0720005, 0.073 : 0.0730005, 0.074 : 0.0740005, 0.075 : 0.0750005, 0.076 : 0.0760005, 0.077 : 0.0770005, 0.078 : 0.0780005, 0.079 : 0.0790005, 0.08 : 0.0800005, 0.081 : 0.0810005, 0.082 : 0.0820005, 0.083 : 0.0830005, 0.084 : 0.0840005, 0.085 : 0.0850005, 0.086 : 0.0860005, 0.087 : 0.0870005, 0.088 : 0.0880005, 0.089 : 0.0890005, 0.09 : 0.0900005, 0.091 : 0.0910005, 0.092 : 0.0920005, 0.093 : 0.0930005, 0.094 : 0.0940005, 0.095 : 0.0950005, 0.096 : 0.0960005, 0.097 : 0.0970005, 0.098 : 0.0980005, 0.099 : 0.0990005,
But with any versions newer than 2.1.12 inclusive, including the latest master branch, it produces wrong data as follows
0 : 4.88013e-07, 0.001 : -6.5526, 0.002 : -19.6588, 0.003 : -39.3186, 0.004 : -65.532, 0.005 : -98.299, 0.006 : -137.62, 0.007 : -183.494, 0.008 : -235.922, 0.009 : -294.903, 0.01 : -6.5436, 0.011 : -19.6498, 0.012 : -39.3096, 0.013 : -65.523, 0.014 : -98.29, 0.015 : -137.611, 0.016 : -183.485, 0.017 : -235.913, 0.018 : -294.894, 0.019 : -360.429, 0.02 : -13.0872, 0.021 : -32.747, 0.022 : -58.9604, 0.023 : -91.7274, 0.024 : -131.048, 0.025 : -176.922, 0.026 : -229.35, 0.027 : -288.331, 0.028 : -353.866, 0.029 : -425.955, 0.03 : -19.6308, 0.031 : -45.8442, 0.032 : -78.6112, 0.033 : -117.932, 0.034 : -163.806, 0.035 : -216.234, 0.036 : -275.215, 0.037 : -340.75, 0.038 : -412.839, 0.039 : -491.481, 0.04 : -26.1744, 0.041 : -58.9414, 0.042 : -98.262, 0.043 : -144.136, 0.044 : -196.564, 0.045 : -255.545, 0.046 : -321.08, 0.047 : -393.169, 0.048 : -471.811, 0.049 : -557.007, 0.05 : -32.718, 0.051 : -72.0386, 0.052 : -117.913, 0.053 : -170.341, 0.054 : -229.322, 0.055 : -294.857, 0.056 : -366.946, 0.057 : -445.588, 0.058 : -530.783, 0.059 : -622.533, 0.06 : -39.2616, 0.061 : -85.1358, 0.062 : -137.564, 0.063 : -196.545, 0.064 : -262.08, 0.065 : -334.169, 0.066 : -412.811, 0.067 : -498.006, 0.068 : -589.756, 0.069 : -688.059, 0.07 : -45.8052, 0.071 : -98.233, 0.072 : -157.214, 0.073 : -222.749, 0.074 : -294.838, 0.075 : -373.48, 0.076 : -458.676, 0.077 : -550.425, 0.078 : -648.728, 0.079 : -753.585, 0.08 : -52.3488, 0.081 : -111.33, 0.082 : -176.865, 0.083 : -248.954, 0.084 : -327.596, 0.085 : -412.792, 0.086 : -504.541, 0.087 : -602.844, 0.088 : -707.701, 0.089 : -819.111, 0.09 : -58.8924, 0.091 : -124.427, 0.092 : -196.516, 0.093 : -275.158, 0.094 : -360.354, 0.095 : -452.103, 0.096 : -550.406, 0.097 : -655.263, 0.098 : -766.673, 0.099 : -884.637,
include
include
include
include
include
include
include
include
include
extern "C" {
include
}
using Dims=std::vector
; using Params=std::map<std::string,std::string>;
size_t GetTotalSize(const Dims &dimensions, const size_t elementSize)
{
return std::accumulate(dimensions.begin(), dimensions.end(), elementSize, std::multiplies<size_t>());
}
Dims ConvertDims(const Dims &dimensions, const size_t targetDims, const bool enforceDims = false, const size_t defaultDimSize = 1)
{
if (targetDims < 1) { throw(std::invalid_argument( "Operator::ConvertDims only accepts targetDims>0")); } Dims ret = dimensions; while (true) { auto it = std::find(ret.begin(), ret.end(), 1); if (it == ret.end()) { break; } else { ret.erase(it); } } while (ret.size() > targetDims) { ret[1] *= ret[0]; ret.erase(ret.begin()); } while (enforceDims && ret.size() < targetDims) { ret.insert(ret.begin(), defaultDimSize); } return ret;
}
size_t Compress(const char dataIn, const Dims &blockCount, char bufferOut, const Params ¶meters)
{
const uint8_t bufferVersion = 1; size_t bufferOutOffset = 0; const size_t ndims = blockCount.size(); Dims convertedDims = ConvertDims(blockCount, 4); sz_params sz; memset(&sz, 0, sizeof(sz_params)); sz.max_quant_intervals = 65536; sz.quantization_intervals = 0; // sz.dataEndianType = LITTLE_ENDIAN_DATA; // sz.sysEndianType = LITTLE_ENDIAN_DATA; sz.sol_ID = SZ; // sz.layers = 1; sz.sampleDistance = 100; sz.predThreshold = 0.99; // sz.offset = 0; sz.szMode = SZ_BEST_COMPRESSION; // SZ_BEST_SPEED; //SZ_BEST_COMPRESSION; sz.gzipMode = 1; sz.errorBoundMode = ABS; sz.absErrBound = 1E-4; sz.relBoundRatio = 1E-3; sz.psnr = 80.0; sz.pw_relBoundRatio = 1E-5; sz.segment_size = static_cast<int>(std::pow(5., static_cast<double>(ndims))); sz.pwr_type = SZ_PWR_MIN_TYPE; convertedDims = ConvertDims(blockCount, 4, true, 1); /* SZ parameters */ int use_configfile = 0; std::string sz_configfile = "sz.config"; Params::const_iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (it->first == "init") { use_configfile = 1; sz_configfile = std::string(it->second); } else if (it->first == "max_quant_intervals") { sz.max_quant_intervals = std::stoi(it->second); } else if (it->first == "quantization_intervals") { sz.quantization_intervals = std::stoi(it->second); } else if (it->first == "sol_ID") { sz.sol_ID = std::stoi(it->second); } else if (it->first == "sampleDistance") { sz.sampleDistance = std::stoi(it->second); } else if (it->first == "predThreshold") { sz.predThreshold = std::stof(it->second); } else if (it->first == "szMode") { int szMode = SZ_BEST_SPEED; if (it->second == "SZ_BEST_SPEED") { szMode = SZ_BEST_SPEED; } else if (it->second == "SZ_BEST_COMPRESSION") { szMode = SZ_BEST_COMPRESSION; } else if (it->second == "SZ_DEFAULT_COMPRESSION") { szMode = SZ_DEFAULT_COMPRESSION; } else { throw std::invalid_argument( "ERROR: ADIOS2 operator unknown SZ parameter szMode: " + it->second + "\n"); } sz.szMode = szMode; } else if (it->first == "gzipMode") { sz.gzipMode = std::stoi(it->second); } else if (it->first == "errorBoundMode") { int errorBoundMode = ABS; if (it->second == "ABS") { errorBoundMode = ABS; } else if (it->second == "REL") { errorBoundMode = REL; } else if (it->second == "ABS_AND_REL") { errorBoundMode = ABS_AND_REL; } else if (it->second == "ABS_OR_REL") { errorBoundMode = ABS_OR_REL; } else if (it->second == "PW_REL") { errorBoundMode = PW_REL; } else { throw std::invalid_argument("ERROR: ADIOS2 operator " "unknown SZ parameter " "errorBoundMode: " + it->second + "\n"); } sz.errorBoundMode = errorBoundMode; } else if (it->first == "absErrBound") { sz.absErrBound = std::stof(it->second); } else if (it->first == "relBoundRatio") { sz.relBoundRatio = std::stof(it->second); } else if (it->first == "pw_relBoundRatio") { sz.pw_relBoundRatio = std::stof(it->second); } else if (it->first == "segment_size") { sz.segment_size = std::stoi(it->second); } else if (it->first == "pwr_type") { int pwr_type = SZ_PWR_MIN_TYPE; if ((it->first == "MIN") || (it->first == "SZ_PWR_MIN_TYPE")) { pwr_type = SZ_PWR_MIN_TYPE; } else if ((it->first == "AVG") || (it->first == "SZ_PWR_AVG_TYPE")) { pwr_type = SZ_PWR_AVG_TYPE; } else if ((it->first == "MAX") || (it->first == "SZ_PWR_MAX_TYPE")) { pwr_type = SZ_PWR_MAX_TYPE; } else { throw std::invalid_argument("ERROR: ADIOS2 operator " "unknown SZ parameter " "pwr_type: " + it->second + "\n"); } sz.pwr_type = pwr_type; } else if ((it->first == "abs") || (it->first == "absolute") || (it->first == "accuracy")) { sz.errorBoundMode = ABS; sz.absErrBound = std::stod(it->second); } else if ((it->first == "rel") || (it->first == "relative")) { sz.errorBoundMode = REL; sz.relBoundRatio = std::stof(it->second); } else if ((it->first == "pw") || (it->first == "pwr") || (it->first == "pwrel") || (it->first == "pwrelative")) { sz.errorBoundMode = PW_REL; sz.pw_relBoundRatio = std::stof(it->second); } } if (use_configfile) { SZ_Init(sz_configfile.c_str()); } else { SZ_Init_Params(&sz); } // Get type info int dtype = SZ_FLOAT; size_t szBufferSize; auto *szBuffer = SZ_compress( dtype, const_cast<char *>(dataIn), &szBufferSize, 0, convertedDims[0], convertedDims[1], convertedDims[2], convertedDims[3]); std::memcpy(bufferOut + bufferOutOffset, szBuffer, szBufferSize); bufferOutOffset += szBufferSize; free(szBuffer); szBuffer = nullptr; SZ_Finalize(); return bufferOutOffset;
}
size_t Decompress(const char bufferIn, const size_t sizeIn, char dataOut, const Dims &blockCount)
{
size_t bufferInOffset = 0; Dims convertedDims = ConvertDims(blockCount, 4, true, 1); int dtype = SZ_FLOAT; size_t dataTypeSize=4; const size_t dataSizeBytes = GetTotalSize(convertedDims, dataTypeSize); void *result = SZ_decompress(dtype, reinterpret_cast<unsigned char *>( const_cast<char *>(bufferIn + bufferInOffset)), sizeIn - bufferInOffset, 0, convertedDims[0], convertedDims[1], convertedDims[2], convertedDims[3]); if (result == nullptr) { throw std::runtime_error("ERROR: SZ_decompress failed\n"); } std::memcpy(dataOut, result, dataSizeBytes); free(result); result = nullptr; return dataSizeBytes;
}
int main()
{
std::vector<float> dataOriginal(100); std::vector<float> dataCompressed(100); std::vector<float> dataDecompressed(100); for(int i=0;i<100;++i) { dataOriginal[i]=(float)i*0.001; } size_t size = Compress(reinterpret_cast<char*>(dataOriginal.data()), {10,10}, reinterpret_cast<char*>(dataCompressed.data()), {{"Accuracy","0.1"}}); Decompress(reinterpret_cast<char*>(dataCompressed.data()), size, reinterpret_cast<char*>(dataDecompressed.data()), {10,10}); for(int i=0;i<10;++i) { for(int j=0;j<10;++j) { std::cout << dataOriginal[i*10+j] << " : " << dataDecompressed[i*10+j] << ", "; } std::cout << std::endl; } return 0;
}
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/szcompressor/SZ/issues/78#issuecomment-960241367, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK3KSKG4LITD3Z64CN2PZ3UKHAATANCNFSM5HH5HGAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@disheng222 Thanks for the help. I will make modifications and then see if everything works.
But this worries me in some other aspects. In principle, I wouldn't say (0,1,1,10,10) is a wrong representation of a 2D array, it could be a 4D array which has 1 as the first two dimensions just by chance. Think of an environment in which I will have a lot of 4D datasets, most of them are normal 4D arrays, but occasionally I will have datasets such as (1,128,128,128) or (128, 128, 1, 128), if I want to have a unified dimension representation of all these 4D arrays, how can I ensure the compression is always good?
Hi Jason, This is a good question. I never tried compressing the dataset with dimensions like 128, 128, 1, 128. In this situation, I always give 3D dimension information to SZ because from the perspective of a compressor, dimension 1 should be ignored. As for 1, 1, 128, 128, the two 1s here means nothing to the compressor, so SZ will treat it as a 2D dataset. I think I can add a filter in the SZ's interface, so that when you give those peculiar parameters, the filter will translate them to recognizable parameters for SZ.
Best, Sheng
On Thu, Nov 4, 2021 at 1:31 PM Jason Wang @.***> wrote:
@disheng222 https://github.com/disheng222 Thanks for the help. I will make modifications and then see if everything works.
But this worries me in some other aspects. In principle, I wouldn't say (0,1,1,10,10) is a wrong representation of a 2D array, it could be a 4D array which has 1 as the first two dimensions just by chance. Think of an environment in which I will have a lot of 4D datasets, most of them are normal 4D arrays, but occasionally I will have datasets such as (1,128,128,128) or (128, 128, 1, 128), if I want to have a unified dimension representation of all these 4D arrays, how can I ensure the compression is always good?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/szcompressor/SZ/issues/78#issuecomment-961310026, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK3KSOBUUBE3LMA5PSJO3LUKLNPRANCNFSM5HH5HGAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi Jason, This is a good question. I never tried compressing the dataset with dimensions like 128, 128, 1, 128. In this situation, I always give 3D dimension information to SZ because from the perspective of a compressor, dimension 1 should be ignored. As for 1, 1, 128, 128, the two 1s here means nothing to the compressor, so SZ will treat it as a 2D dataset. I think I can add a filter in the SZ's interface, so that when you give those peculiar parameters, the filter will translate them to recognizable parameters for SZ. Best, Sheng …
That would be great. I think such filters were originally there in early versions, and that's why our tests were passing before. It's just in the newest version that they somehow do not take effect. It would be great if consistency can be ensured across different versions.
@JasonRuonanWang two things:
@JasonRuonanWang two things:
- LibPressio already normalizes these kinds of things for you. Passing extra 1's like you do here causes a 16x reduction in the performance of ZFP. LibPressio's compressor plugins normalize these differences away and call the compressor in the way recommended by the developer of each compressor. I understand keeping the old code for backwards compatibility, but I encourage you consider to invoking SZ from LibPressio going forward.
- It's probably not realistic to assume that different versions of SZ are strictly byte-for-byte forward compatible. I know for a fact that MGARD isn't. This test is probably too strict and should instead test if the error bound advertised by the compressor is being respected. This is how LibPressio validates things in its test suite.
I am not an application person, but an IO middleware developer. I can only provide reasonable options and let our users decide what to use, i.e. directly use sz or through libpressio. In fact, some of our users' feedback are that they prefer directly using sz with ADIOS2 because they have difficulties when deploying libpressio and its dependencies, while sz is relatively easier. I am not here to judge which way is correct. I just have to respect our users' feeling and choice.
I know for a fact that MGARD people are actively working on strict byte-for-byte forward compatibility. It's absolutely realistic if you have headers storing the version information and then keep all decompression sub-routines back from different versions in your code base. You can try decompressing a .tar.gz file from a 20-year old package and see if it still decompresses. Scientists sometimes do same thing. They write a file and then come back 20 years later to read that file. If this is not guaranteed then people won't use these things in production code.
@JasonRuonanWang
I'm happy to help resolve roadblocks to the dependency installation problems, improve documentation etc... We already have spack packages that work pretty well and work for all the compressors (I'm even almost have MGARD-GPU working 😄 ; I'm working in getting nvcomp to play nicely right now which is the last roadblock to MGARD GPU 1.X in LibPressio). If there is some other venue that needs packaging too, please let me know. I also have the plugin design to normalize against all the previous versions of MGARD since it was released.
I agree they are working on it, and that it is a laudable goal. My point is that most compressors allows for decompress_v2(compress_v2(i)) != decompress_v1(compress_v1(i))
as long as decompress_v1(compress_v1(i)) == decompress_v2(compress_v1(i))
. Since this allows the compressor to improve over time. There is the added point that with tools like ADIOS2 and LibPressio, it can be made easy to download, compile, and use old versions of the compressors if this is required.
Hi Jason, I have added the filter to correct the dimension if needed. Specifically, if the dimension is like 0, 1, 1, 10, 10, the final dimension transmitted to SZ would be 0, 0, 0, 10, 10 internally. Another example: if the dimension is 0, 10, 1, 10, 10, the actual dimension would become 0, 0, 10, 10, 10. All these changes are transparent to you or users. That is, if you use 0, 1, 1, 10, 10 to compress a dataset by SZ_compress() , you can either use 0, 1, 1, 10, 10 or 0, 0, 0, 10, 10 to decompress the compressed data by calling SZ_decompress(). The results are the same. Please let me know if you have any questions or problems.
Best, Sheng
On Thu, Nov 4, 2021 at 2:26 PM Jason Wang @.***> wrote:
Hi Jason, This is a good question. I never tried compressing the dataset with dimensions like 128, 128, 1, 128. In this situation, I always give 3D dimension information to SZ because from the perspective of a compressor, dimension 1 should be ignored. As for 1, 1, 128, 128, the two 1s here means nothing to the compressor, so SZ will treat it as a 2D dataset. I think I can add a filter in the SZ's interface, so that when you give those peculiar parameters, the filter will translate them to recognizable parameters for SZ. Best, Sheng … <#m-4807776390426215248>
That would be great. I think such filters were originally there in early versions, and that's why our tests were passing before. It's just in the newest version that they somehow do not take effect. It would be great if consistency can be ensured across different versions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/szcompressor/SZ/issues/78#issuecomment-961350591, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK3KSKAAEJ5FTOE7TGFRYLUKLUAXANCNFSM5HH5HGAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@robertu94 Yeah, that's more or less backward compatibility rather than forward. But ensuring currently available features not being removed is as important. Even if some features are achieved by chance, once you have it, there is always some users that are relying on it.
We have user codes which take several months on average for review to merge a PR. If there is anything wrong then people will have to rely on personal branches for months before a fix can be merged into master, which will then cause a lot of other issues. It's not as easy as we are only dealing with ADIOS or LibPressio. If that's the case then I wouldn't even bother opening a ticket here.
@disheng222 Thank you so much for all the help! I am still having a few tests failing but I am not sure if it's something in ADIOS2 or SZ. I will keep you updated.
@disheng222 I have got all tests passed. Thanks very much for the help!
Just wondering if there has been any API or fundamental algorithm changes in v 2.1.12? We have a bunch of tests written in ADIOS for SZ compression, which have been working for quite long time since a number of versions back, but 2.1.12 broke almost all of our tests. I haven't had a chance to look into the details yet, but it seems to me that there is something in 2.1.12 that is very different than previous versions. Any hints would be very much appreciated. Thanks.