I discover a bug in the to function when parameter data cannot be constant evaluated and -mbmi2 is not enabled.
From this link on godbolt, you can see that the results differ between gcc and clang.
The code with a bug is as follows:
T t;
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Warray-bounds"
__builtin_memcpy(&t, data.data(), sizeof(t));
#pragma GCC diagnostic pop
const auto index = T(data.size() * __CHAR_BIT__);
#ifdef __BMI2__
if constexpr (sizeof(t) <= sizeof(u32)) {
return __builtin_ia32_bzhi_si(t, index);
} else if constexpr (sizeof(t) <= sizeof(u64)) {
return __builtin_ia32_bzhi_di(t, index);
} else
#endif
return t & ((T(1) << index) - T(1));
In the example from the link, data.size() = 4, so index = 32.
According to the C++ standard, u32(1) << 32 is undefined behavior, which is why gcc and clang produce different results here.
Special handling is needed for cases where index >= size.
Additionally, the part with __builtin_memcpy also needs to consider the case where data.size() < sizeof(t). Otherwise, it may exceed the boundary. Also, we should initialize t since it may copy with a size smaller than sizeof(T).
T t{};
__builtin_memcpy(&t, data.data(), data.size() < sizeof(t) ? data.size() : sizeof(t));
const auto index = T(data.size() * __CHAR_BIT__);
#ifdef __BMI2__
if constexpr (sizeof(t) <= sizeof(u32)) {
return __builtin_ia32_bzhi_si(t, index);
} else if constexpr (sizeof(t) <= sizeof(u64)) {
return __builtin_ia32_bzhi_di(t, index);
} else {
#endif
constexpr T size = sizeof(T) * __CHAR_BIT__;
return index >= size ? t : t & ((T(1) << index) - T(1));
#ifdef __BMI2__
}
#endif
I have submitted a pull request to fix both of these issues.
I discover a bug in the
to
function when parameterdata
cannot be constant evaluated and-mbmi2
is not enabled. From this link on godbolt, you can see that the results differ betweengcc
andclang
. The code with a bug is as follows:In the example from the link,
data.size() = 4
, soindex = 32
. According to the C++ standard,u32(1) << 32
is undefined behavior, which is whygcc
andclang
produce different results here. Special handling is needed for cases whereindex >= size
.Additionally, the part with
__builtin_memcpy
also needs to consider the case wheredata.size() < sizeof(t)
. Otherwise, it may exceed the boundary. Also, we should initializet
since it may copy with a size smaller thansizeof(T)
.I have submitted a pull request to fix both of these issues.