green-coder / cdc

A library for performing Content-Defined Chunking (CDC) on data streams.
MIT License
23 stars 5 forks source link

Performance improvements #2

Closed aawsome closed 2 years ago

aawsome commented 2 years ago

I used this crate to chunk a huge amount of data and encountered worse perfomance compared to https://github.com/restic/chunker (this is a Go implementation).

When profiling the results, I see some hot paths within this crate, namely

1) in rolling_hash.rs::slide

 let mod_index = (self.hash >> self.polynom_shift) & 255;

=> maybe this is not optimized by the compiler as no bound of polynom_shift is known. However polynom_shift is always smaller than 64. A trick like in https://github.com/restic/chunker/blob/master/chunker.go#L230 might help to get more optimized code.

2) polynom.rs::degree Seems like the for loop is very inefficient here. The Go implementation uses https://github.com/restic/chunker/blob/master/chunker.go#L230

I would suggest to change this into

    fn degree(&self) -> i32 {
        num_bits::<Self>() as i32 - self.leading_zeros() as i32 - 1
        }

or maybe even change the result type to u8 (which should be sufficient here).

green-coder commented 2 years ago

Please provide the specs of the CPU on which you run your tests. Was it 0x86 based?

I cannot give a time frame for the fix because maintaining this library is no longer on my priority list.

Also worth mentioning, there are alternative Rust libraries with an emphasis on performance:

aawsome commented 2 years ago

Please provide the specs of the CPU on which you run your tests. Was it 0x86 based?

Yes, it was on x86, .e.g.

vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz stepping : 7 microcode : 0x25 cpu MHz : 1485.765 cache size : 3072 KB

I cannot give a time frame for the fix because maintaining this library is no longer on my priority list.

I already implemented some benchmarks and tested the changes. If it's ok, I can make a PR in the next days.

Also worth mentioning, there are alternative Rust libraries with an emphasis on performance:

* https://github.com/dpc/rdedup

* https://github.com/nlfiedler/fastcdc-rs

* https://github.com/jrobhoward/quickcdc

Thanks - I already knew about some of those. However, I'm currently working on a rust client for a restic backup repository (https://github.com/restic/restic) and there the CDC is given to use Rabin fingerprints, hence I was looking for such a crate and was very thankful finding yours ;-)

aawsome commented 2 years ago

I actually figured out that the shifting seems to be already optimal compiled - I don't get any performance improvement asserting a shift lower than 64.

The enhancement about the polynomial degree however leads to:

slide 1000x             time:   [107.62 us 108.43 us 109.41 us]                        
                        change: [-89.783% -89.651% -89.511%] (p = 0.00 < 0.05)
                        Performance has improved.

slide 10000x            time:   [153.83 us 154.36 us 155.03 us]                         
                        change: [-85.824% -85.697% -85.592%] (p = 0.00 < 0.05)
                        Performance has improved.

slide 100000x           time:   [627.94 us 633.68 us 639.98 us]                          
                        change: [-60.342% -59.710% -59.218%] (p = 0.00 < 0.05)
                        Performance has improved.

where the benchmark was just calling

let mut rabin = Rabin64::new(5);
for _ in 0..i {
   rabin.slide(&data)
}

PR is coming soon...

green-coder commented 2 years ago

Thanks - I already knew about some of those. However, I'm currently working on a rust client for a restic backup repository (https://github.com/restic/restic) and there the CDC is given to use Rabin fingerprints, hence I was looking for such a crate and was very thankful finding yours ;-)

I am glad it helps :-)

A PR would be very welcome, as over the course of the last 5 years my Rust skills ... kind of rusted.

green-coder commented 2 years ago

I published the version 0.1.1 of the crate with the improvements. Again, thanks for your contribution.