mklarqvist / positional-popcount

Fast C functions for the computing the positional popcount (pospopcnt).
Apache License 2.0
52 stars 5 forks source link

Minor optimization (improving avx512_csa) #9

Closed lemire closed 5 years ago

lemire commented 5 years ago

and putting back the standard popcnt which is important as it is a known lower bound on the cycle count.

We are now running at

pospopcnt_u16_avx512_csa                    cycles per 16-bit word:  0.166 
avx512popcnt                                cycles per 16-bit word:  0.086 

You can do the standard popcnt using about 0.344 cycles per 64-bit word (or 4 x 16 bits). It takes about 0.66 cycles to do the 16-bit pospopcnt of a 64-bit word.

If you try to do the same thing without SIMD instructions, it takes 70 cycles for the same task.

In other words, SIMD instructions make the code run 100 faster.

@WojciechMula