green-coder / cdc

A library for performing Content-Defined Chunking (CDC) on data streams.
MIT License
23 stars 5 forks source link

performance optimization & code modernization #3

Closed aawsome closed 2 years ago

aawsome commented 2 years ago

The first two commits are a performance optimization of the computation Polynom.degree() including a criterion benchmark to test it. The benchmark results of the change are:

slide 1000x             time:   [107.62 us 108.43 us 109.41 us]                        
                        change: [-89.783% -89.651% -89.511%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

slide 10000x            time:   [153.83 us 154.36 us 155.03 us]                         
                        change: [-85.824% -85.697% -85.592%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

slide 100000x           time:   [627.94 us 633.68 us 639.98 us]                          
                        change: [-60.342% -59.710% -59.218%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) high mild
  11 (11.00%) high severe

closes #2

In the other commits I tried to modernize the code such that warnings, code formatting and (most) clippy results are fixed. Feel free to omit these three commits. I'm not sure if the clippy-hinted improvements in rolling_hash.rs may give a small additional performance gain. But it might be also within statistical noise. The results are:

slide 1000x             time:   [106.70 us 107.83 us 109.07 us]                        
                        change: [-2.5015% -1.2324% +0.1788%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) high mild
  12 (12.00%) high severe

slide 10000x            time:   [151.65 us 152.65 us 153.87 us]                         
                        change: [-4.5617% -2.9682% -1.7269%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

slide 100000x           time:   [607.56 us 608.77 us 610.50 us]                          
                        change: [-3.0764% -2.1333% -0.9672%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
green-coder commented 2 years ago

Thank you for the contribution !