Closed chriso closed 3 years ago
This PR simplifies the Dedupe() loop from #7. Instead of incrementing the input pointer in a loop epilogue block, we can increment it just after the load.
Dedupe()
Dedupe/size_16,_with_0%_chance_of_repeat-4 11.4GB/s ± 1% 17.1GB/s ± 2% +50.48% (p=0.008 n=5+5) Dedupe/size_16,_with_10%_chance_of_repeat-4 8.48GB/s ± 0% 14.56GB/s ± 2% +71.80% (p=0.008 n=5+5) Dedupe/size_16,_with_50%_chance_of_repeat-4 5.12GB/s ± 0% 7.94GB/s ± 2% +55.08% (p=0.008 n=5+5) Dedupe/size_16,_with_100%_chance_of_repeat-4 12.7GB/s ± 0% 12.6GB/s ± 2% ~ (p=1.000 n=5+5) Dedupe/size_32,_with_0%_chance_of_repeat-4 17.2GB/s ± 2% 20.5GB/s ± 5% +19.26% (p=0.008 n=5+5) Dedupe/size_32,_with_10%_chance_of_repeat-4 14.4GB/s ± 2% 19.7GB/s ± 0% +36.77% (p=0.008 n=5+5) Dedupe/size_32,_with_50%_chance_of_repeat-4 14.4GB/s ± 2% 18.0GB/s ± 1% +24.34% (p=0.008 n=5+5) Dedupe/size_32,_with_100%_chance_of_repeat-4 16.6GB/s ± 0% 16.7GB/s ± 2% ~ (p=0.190 n=4+5)
This PR simplifies the
Dedupe()
loop from #7. Instead of incrementing the input pointer in a loop epilogue block, we can increment it just after the load.