// count leading/trailing bits
#if ( ( defined __i386 || defined __x86_64__ ) && defined __BMI__ ) || defined _M_IX86 || defined _M_X64
# ifdef _MSC_VER
# include <intrin.h>
# else
# include <x86intrin.h>
# endif
# if ROBIN_HOOD(BITNESS) == 32
# define ROBIN_HOOD_PRIVATE_DEFINITION_CTZ() _tzcnt_u32
# else
# define ROBIN_HOOD_PRIVATE_DEFINITION_CTZ() _tzcnt_u64
# endif
# if defined __AVX2__ || defined __BMI__
# define ROBIN_HOOD_COUNT_TRAILING_ZEROES(x) ROBIN_HOOD(CTZ)(x)
# else
# define ROBIN_HOOD_COUNT_TRAILING_ZEROES(x) ((x) ? ROBIN_HOOD(CTZ)(x) : ROBIN_HOOD(BITNESS))
# endif
#elif defined _MSC_VER
# if ROBIN_HOOD(BITNESS) == 32
# define ROBIN_HOOD_PRIVATE_DEFINITION_BITSCANFORWARD() _BitScanForward
# else
# define ROBIN_HOOD_PRIVATE_DEFINITION_BITSCANFORWARD() _BitScanForward64
# endif
...
If __AVX2__ (msvc) or __BMI__ (gcc/clang) is defined, we're targetting CPU which supports BMI1 instruction set, so fallback to bsf shouldn't happen, thus there's no need for zero-case check.
fastForward()
uses_BitScanForward
, which compiles tobsf
, which is slow on ZEN/ZEN2.tzcnt
can be used instead.More detailed discussion: https://github.com/lz4/lz4/issues/867
Proposed change:
If
__AVX2__
(msvc) or__BMI__
(gcc/clang) is defined, we're targetting CPU which supports BMI1 instruction set, so fallback tobsf
shouldn't happen, thus there's no need for zero-case check.