Closed WojciechMula closed 4 years ago
There's a detailed description in a comment, but I don't know how to link to specific line in a diff.
Greay work @WojciechMula! The URL to the comment sections is: https://github.com/mklarqvist/positional-popcount/pull/37/files#diff-8faf5f851dc871bd75d3a606351b9b76R2948
Our first 32-bit pospopcnt procedures are merely naive translations of 16-bit procedures. While Harley-Seal algorithm cannot be simplified further, updates of vector of MSBs (v16) was done in a simple loop. I come up with something a bit faster. (Maybe this approach can be applied also to 16-bit pospopcnt?)
cc @lemire
Below are benchmark results from Skylake-X machine: