Closed cartoonist closed 6 months ago
Nice. I'll wait for the CI/CD to run and merge. Thanks for the fix!
Thank you for merging and also for the library!
Just for the record, I just realised that there are some scenarios that this PR doesn't fix the issue: for example when using GCC, __has_builtin
has been introduced in GCC 10. So for 4.5 <= GCCs < 10, which have __builtin_unreachable
but not __has_builtin
, the macro is replace by noop causing nvcc throws the warning.
I don't know a better way for checking whether __builtin_unreachable
is available or not to cover those scenarios. So, I personally opted for using newer versions of gcc.
To avoid the issue altogether, we could probably have done:
uint32_t TrailingZeros(T x) {
uint32_t res;
PHMAP_IF_CONSTEXPR(sizeof(T) == 8)
res = base_internal::CountTrailingZerosNonZero64(static_cast<uint64_t>(x));
else
res = base_internal::CountTrailingZerosNonZero32(static_cast<uint32_t>(x));
return res;
}
I think that would do the trick.
I'll do this then (maybe tomorrow).
I just checked with gcc 9.4.0 and Cuda 11.4 and it worked.
Thanks again @cartoonist , I just updated the repo with this last change.
The fix is based on the accepted answer here and other answers/comments there. The builtin function
__builtin_unreachable
is supported since CUDA 11.3.