greg7mdp / parallel-hashmap

A family of header-only, very fast and memory-friendly hashmap and btree containers.
https://greg7mdp.github.io/parallel-hashmap/
Apache License 2.0
2.47k stars 234 forks source link

Fix nvcc warning about missing return statement #235

Closed cartoonist closed 6 months ago

cartoonist commented 6 months ago

The fix is based on the accepted answer here and other answers/comments there. The builtin function __builtin_unreachable is supported since CUDA 11.3.

greg7mdp commented 6 months ago

Nice. I'll wait for the CI/CD to run and merge. Thanks for the fix!

cartoonist commented 6 months ago

Thank you for merging and also for the library!

Just for the record, I just realised that there are some scenarios that this PR doesn't fix the issue: for example when using GCC, __has_builtin has been introduced in GCC 10. So for 4.5 <= GCCs < 10, which have __builtin_unreachable but not __has_builtin, the macro is replace by noop causing nvcc throws the warning.

I don't know a better way for checking whether __builtin_unreachable is available or not to cover those scenarios. So, I personally opted for using newer versions of gcc.

greg7mdp commented 6 months ago

To avoid the issue altogether, we could probably have done:

uint32_t TrailingZeros(T x) {
    uint32_t res;
    PHMAP_IF_CONSTEXPR(sizeof(T) == 8)
        res = base_internal::CountTrailingZerosNonZero64(static_cast<uint64_t>(x));
    else
        res = base_internal::CountTrailingZerosNonZero32(static_cast<uint32_t>(x));
    return res;
}
cartoonist commented 6 months ago

I think that would do the trick.

greg7mdp commented 6 months ago

I'll do this then (maybe tomorrow).

cartoonist commented 6 months ago

I just checked with gcc 9.4.0 and Cuda 11.4 and it worked.

greg7mdp commented 6 months ago

Thanks again @cartoonist , I just updated the repo with this last change.