martinus / robin-hood-hashing

Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
https://gitter.im/martinus/robin-hood-hashing
MIT License
1.52k stars 146 forks source link

Suboptimal performance on ZEN/ZEN2 #73

Closed wolfpld closed 4 years ago

wolfpld commented 4 years ago

fastForward() uses _BitScanForward, which compiles to bsf, which is slow on ZEN/ZEN2. tzcnt can be used instead.

More detailed discussion: https://github.com/lz4/lz4/issues/867

Proposed change:

// count leading/trailing bits
#if ( ( defined __i386 || defined __x86_64__ ) && defined __BMI__ ) || defined _M_IX86 || defined _M_X64
#    ifdef _MSC_VER
#        include <intrin.h>
#    else
#        include <x86intrin.h>
#    endif
#    if ROBIN_HOOD(BITNESS) == 32
#        define ROBIN_HOOD_PRIVATE_DEFINITION_CTZ() _tzcnt_u32
#    else
#        define ROBIN_HOOD_PRIVATE_DEFINITION_CTZ() _tzcnt_u64
#    endif
#    if defined __AVX2__ || defined __BMI__
#        define ROBIN_HOOD_COUNT_TRAILING_ZEROES(x) ROBIN_HOOD(CTZ)(x)
#    else
#        define ROBIN_HOOD_COUNT_TRAILING_ZEROES(x) ((x) ? ROBIN_HOOD(CTZ)(x) : ROBIN_HOOD(BITNESS))
#    endif
#elif defined _MSC_VER
#    if ROBIN_HOOD(BITNESS) == 32
#        define ROBIN_HOOD_PRIVATE_DEFINITION_BITSCANFORWARD() _BitScanForward
#    else
#        define ROBIN_HOOD_PRIVATE_DEFINITION_BITSCANFORWARD() _BitScanForward64
#    endif
...

If __AVX2__ (msvc) or __BMI__ (gcc/clang) is defined, we're targetting CPU which supports BMI1 instruction set, so fallback to bsf shouldn't happen, thus there's no need for zero-case check.

martinus commented 4 years ago

Hi, thanks for that! do you want to make a Pull request or should I integrate it?