segmentio / asm

Go library providing algorithms optimized to leverage the characteristics of modern CPUs
MIT No Attribution
869 stars 36 forks source link

Simplify Dedupe() loop #8

Closed chriso closed 3 years ago

chriso commented 3 years ago

This PR simplifies the Dedupe() loop from #7. Instead of incrementing the input pointer in a loop epilogue block, we can increment it just after the load.

Dedupe/size_16,_with_0%_chance_of_repeat-4    11.4GB/s ± 1%   17.1GB/s ± 2%  +50.48%  (p=0.008 n=5+5)
Dedupe/size_16,_with_10%_chance_of_repeat-4   8.48GB/s ± 0%  14.56GB/s ± 2%  +71.80%  (p=0.008 n=5+5)
Dedupe/size_16,_with_50%_chance_of_repeat-4   5.12GB/s ± 0%   7.94GB/s ± 2%  +55.08%  (p=0.008 n=5+5)
Dedupe/size_16,_with_100%_chance_of_repeat-4  12.7GB/s ± 0%   12.6GB/s ± 2%     ~     (p=1.000 n=5+5)
Dedupe/size_32,_with_0%_chance_of_repeat-4    17.2GB/s ± 2%   20.5GB/s ± 5%  +19.26%  (p=0.008 n=5+5)
Dedupe/size_32,_with_10%_chance_of_repeat-4   14.4GB/s ± 2%   19.7GB/s ± 0%  +36.77%  (p=0.008 n=5+5)
Dedupe/size_32,_with_50%_chance_of_repeat-4   14.4GB/s ± 2%   18.0GB/s ± 1%  +24.34%  (p=0.008 n=5+5)
Dedupe/size_32,_with_100%_chance_of_repeat-4  16.6GB/s ± 0%   16.7GB/s ± 2%     ~     (p=0.190 n=4+5)