unique process of compression

mounamouna commented 9 years ago

Hi, I succeed to build the package FastPFor in my machine and compile the example.cpp. So, i change the integers in the vector data: std::vector mydata(N); mydata[0] = 4294967295; mydata[1] = 4294967295; i display compressed data and decompressed data std::cout<<"Compressed data " << compressed_output.data()<<std::endl; \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ codec.decodeArray(compressed_output.data(), compressed_output.size(), mydataback.data(), recoveredsize); std::cout<<"Decompressed data 1 " <<mydataback.data()[0]<<std::endl; std::cout<<"Decompressed data 2 " <<mydataback.data()[1]<<std::endl; the result is in first execution Compressed data 0x1b99a80 You are using 0.109 bits per integer. Decompressed data 1 4294967295 Decompressed data 2 4294967295 //////////////////////////////////////////////////////////////////////////////// in second execution Compressed data 0xd0da80 You are using 0.109 bits per integer. Decompressed data 1 4294967295 Decompressed data 2 4294967295 ---> i obtain a different compressed data. Perhaps this is the address of the compressed data, for thus i add std::cout<<"Compressed data " << compressed_output.data()[0]<<std::endl; but the result of the compressed data is the same when i change mydata[0] . How can i distinguish between two processes of compression?? Is the compressed data unique for each compression???What is the information which makes compression process unique and unchangeable??

Thanks in advance.

lemire commented 9 years ago

Can you provide a test case? (source code)

I do not understand the issue you are reporting.

mounamouna commented 9 years ago

I want to compress two ipv4 addresses, so i define a vector contained two 32 bits integers. After that, i need to know if the compressed data is writen in 32 bits max or no, for thus i need to display the compressed data in form of decimal. The later will be considered as another address (writen in maximum 32 bits). It is possible ?? Transform two address on one can be performed with this process of compression?? I can deal with the original addresses differently (vector contained eight 8 bits integers), my goal is to compress the information in max 32 bits word.

2015-05-09 1:53 GMT+02:00 Daniel Lemire notifications@github.com:

Can you provide a test case? (source code)

I do not understand the issue you are reporting.

— Reply to this email directly or view it on GitHub https://github.com/lemire/FastPFor/issues/20#issuecomment-100395661.

lemire commented 9 years ago

@mounamouna

This library is ill-suited for the purpose you describe. Though you can certainly can encode an array containing two 32-bit integers, it is unlikely that the result will be a single 32-bit integer in general.

This library is meant for computing arrays containing many integers. Please see the example:

https://github.com/lemire/FastPFor/blob/master/example.cpp

I am closing this issue as invalid.

If you do find a bug, please provide a reproducible test case.

mounamouna commented 9 years ago

Please Sir ,just a final question, "computing arrays containing many integers", that means we are able to compute each integer in the initial vector through the compressed data???? so the compressed data is a vector contains integers smaller then integers in the initial vector???That is right??

Thanks in advance Sir.

2015-05-09 2:42 GMT+02:00 Daniel Lemire notifications@github.com:

@mounamouna https://github.com/mounamouna

This library is ill-suited for the purpose you describe. Though you can certainly can encode an array containing two 32-bit integers, it is unlikely that the result will be a single 32-bit integer in general.

This library is meant for computing arrays containing many integers. Please see the example:

https://github.com/lemire/FastPFor/blob/master/example.cpp

I am closing this issue as invalid.

If you do find a bug, please provide a reproducible test case.

— Reply to this email directly or view it on GitHub https://github.com/lemire/FastPFor/issues/20#issuecomment-100400210.

lemire commented 9 years ago

that means we are able to compute each integer in the initial vector through the compressed data?

Of course.

so the compressed data is a vector contains integers smaller then integers in the initial vector?

The goal of the library is to have fewer integers in the compressed vector. Yes.

mounamouna commented 9 years ago

It isn't logical to have the same compressed vector for two different initial vectors. that's right? I tested with two different initial vectors (a,b) and (a1,b) but the compressed vector compressed_output.data()[] is the same. Is it a bug ??

2015-05-09 3:33 GMT+02:00 Daniel Lemire notifications@github.com:

that means we are able to compute each integer in the initial vector through the compressed data?

Of course.

so the compressed data is a vector contains integers smaller then integers in the initial vector?

The goal of the library is to have fewer integers in the compressed vector. Yes.

— Reply to this email directly or view it on GitHub https://github.com/lemire/FastPFor/issues/20#issuecomment-100408786.

lemire commented 9 years ago

Yes it is a bug. It is most likely a bug in your code.

mounamouna commented 9 years ago

mouna@ubuntu:~/newtmp/FastPFor$ ./example

Compressed data 19984

Compressed data 23

You are using 0.109 bits per integer.

Decompressed data 1 4294967295

Decompressed data 2 4294967295

mouna@ubuntu:~/newtmp/FastPFor$ make example

[ 85%] Built target FastPFor

Scanning dependencies of target example

[100%] Building CXX object CMakeFiles/example.dir/example.cpp.o

Linking CXX executable example

[100%] Built target example

mouna@ubuntu:~/newtmp/FastPFor$ ./example

Compressed data 19984

Compressed data 23

You are using 0.109 bits per integer.

Decompressed data 1 4967295

Decompressed data 2 4294967295

What is the problem? I used 32 bits integers, i changed the first integer but the compressed vector still the same.

2015-05-08 18:51 GMT-07:00 Daniel Lemire notifications@github.com:

Yes it is a bug. It is most likely a bug in your code.

— Reply to this email directly or view it on GitHub https://github.com/lemire/FastPFor/issues/20#issuecomment-100410612.

mounamouna commented 9 years ago

mouna@ubuntu:~/newtmp/FastPFor$ ./example

Compressed data 19984

Compressed data 23

You are using 0.109 bits per integer.

Decompressed data 1 4294967295

Decompressed data 2 4294967295

mouna@ubuntu:~/newtmp/FastPFor$ make example

[ 85%] Built target FastPFor

Scanning dependencies of target example

[100%] Building CXX object CMakeFiles/example.dir/example.cpp.o

Linking CXX executable example

[100%] Built target example

mouna@ubuntu:~/newtmp/FastPFor$ ./example

Compressed data 19984

Compressed data 23

You are using 0.109 bits per integer.

Decompressed data 1 4967295

Decompressed data 2 4294967295

What is the problem? I used 32 bits integers, i changed the first one but the compressed vector still (19984 , 23).

2015-05-09 3:51 GMT+02:00 Daniel Lemire notifications@github.com:

Yes it is a bug. It is most likely a bug in your code.

— Reply to this email directly or view it on GitHub https://github.com/lemire/FastPFor/issues/20#issuecomment-100410612.

lemire commented 9 years ago

If you think you have found a bug, please submit a test case.

mounamouna commented 9 years ago

test case first: mydata[0] = 4294967295; mydata[1] = 4294967295; second: mydata[0] = 4967295; mydata[1] = 4294967295;

2015-05-08 19:30 GMT-07:00 Daniel Lemire notifications@github.com:

If you think you have found a bug, please submit a test case.

— Reply to this email directly or view it on GitHub https://github.com/lemire/FastPFor/issues/20#issuecomment-100412978.

lemire commented 9 years ago

The size of the compressed vector is most certainly more than two words. Probably four words. That is, the "compressed" vector is probably larger than the input vector.

These arrays you provide are not compressible using this library. They are too short.

Please read the papers, study carefully the code and the examples.

Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137 http://onlinelibrary.wiley.com/doi/10.1002/spe.2203/abstract
Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, Software Practice & Experience (to appear) http://arxiv.org/abs/1401.6399
Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/abs/1503.07387
Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen, A General SIMD-based Approach to Accelerating Compression Algorithms, ACM Transactions on Information Systems 33 (3), 2015. http://arxiv.org/abs/1502.01916

I am not going to be able to help you further.

mounamouna commented 9 years ago

Thank you Sir.

2015-05-09 5:09 GMT+02:00 Daniel Lemire notifications@github.com:

The size of the compressed vector is most certainly more than two words. Probably four words. That is, the "compressed" vector is probably larger than the input vector.

These arrays you provide are not compressible using this library. They are too short.

Please read the papers, study carefully the code and the examples.

Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137 http://onlinelibrary.wiley.com/doi/10.1002/spe.2203/abstract

Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, Software Practice & Experience (to appear) http://arxiv.org/abs/1401.6399

Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/abs/1503.07387

Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen, A General SIMD-based Approach to Accelerating Compression Algorithms, ACM Transactions on Information Systems 33 (3), 2015. http://arxiv.org/abs/1502.01916

I am not going to be able to help you further.

— Reply to this email directly or view it on GitHub https://github.com/lemire/FastPFor/issues/20#issuecomment-100415216.

lemire / FastPFor

unique process of compression #20