lemire / simdcomp

A simple C library for compressing lists of integers using binary packing
BSD 3-Clause "New" or "Revised" License
488 stars 53 forks source link

Unable to compress array of random numbers #6

Closed Bhuvanamitra closed 9 years ago

Bhuvanamitra commented 9 years ago

The code is able to effectively compress array of values with fixed difference among them. I tried initializing the datain array with rand() function and compressing that. The result was, there was no compression in size and size of compressed data was originalsize+N. Is the code designed to compress only data with fixed difference??

lemire commented 9 years ago

Thanks for your feedback...

No software is able to reliably compress an array of random numbers.

"I tried initializing the datain array with rand() function and compressing that."

On a typical machine, rand() will generate pseudo-random 31-bit integers. Assuming that the pseudo-random number generator is sane, this is, at best, compressible reliably from 32-bit per integer down to 31-bit per integer using any software.

That is, the problem you are encountering is one of basic principles and unrelated to this library. Randomness is not compressible. If it were, it would not be random!

" Is the code designed to compress only data with fixed difference??"

I have edited the README to clarify the applicability:

              The assumption is either that you have a list of 32-bit integers where most 
              of them are small, or a list of 32-bit integers where differences between 
              successive integers are small.

I have also changed the example.c file so that the gaps are random (but small).