velvia / compressed-vec

SIMD Floating point and integer compressed vector library
Apache License 2.0
78 stars 8 forks source link

XorAppender is not providing the expected compression #7

Open msk-apk opened 1 month ago

msk-apk commented 1 month ago

I used the below code to compression vector with f32. But its not providing even 50% compression.

Following are the data for a vector with 256 as dimension. original size in bytes of f32 vector 1048 compressed vector size in bytes 1011 uncompressed size of f32 vector in bytes 1048 for vector which has dimension 1000, original size in bytes of f32 vector 4120 compressed vector size in bytes 3854 uncompressed size of f32 vector in bytes 4120

Is there anyway I can get better compression ratio?

fn main() { let mut appender = VectorF32XorAppender::try_new(2048).unwrap(); let dimension = 1000; let mut data: Vec = vec![]; let mut range = rand::thread_rng(); for i in 1..dimension { let value:f32 = range.gen(); data.push(value); } println!("original size in bytes of f32 vector {}", data.get_size()); let finished_vec = appender.encode_all(data).unwrap(); println!("compressed vector size in bytes {}", finished_vec.get_size()); let reader = VectorReader::::try_new(&finished_vec[..]).unwrap(); let mut sink = VecSink::::new(); reader.decode_to_sink(&mut sink).unwrap(); let uncompressed_data: Vec = sink.vec; println!("uncompressed size of f32 vector in bytes {}", uncompressed_data.get_size()); }

velvia commented 1 month ago

The XOR compressor is based on similarities of successive values, so it works best when data isn't changing very much. It looks from your example that you are generating random values, which is pretty much worst possible case for this kind of compression. Instead, this is designed for more real life floating point time series, which may not be changing very fast most of the time.

msk-apk commented 1 month ago

got it thanks.