Closed srogatch closed 3 years ago
I posted 2 issues within this bug report. For one, I provided the Stackoverflow link. For the other, I provided the compiler output. Thanks for commenting on Stackoverflow.
The log you've posted does not mention anything about the textures.
The SO appears to be about a different issue, AFAICT. Texture/surface lookup support is indeed not implemented in clang.
However, in this case clang appears to be complaining about a different problem:
error: explicit qualification required to use member 'ProcessFloatMinusZero' from dependent base class return BFE(ProcessFloatMinusZero(key), bit_start, num_bits);
This one does not have anything to do with the textures and it does appear to be an upstream portability issue.
They confirm what is said on Stackoverflow (at the link in my first comment here): that texture reference from device function to a host texture variable is valid in CUDA. But Clang fails to compile that.
I can not access the bug you've referenced. What's their rationale?
NVIDIA in https://developer.nvidia.com/nvidia_bug/3373556 claims it's not their bug either.
It appears to be a cub/thrust portability issue, not an issue with clang.
It would not be the first one -- AFAICT, "Compiles with NVCC" is their largest concern. NVCC is somewhat more lax than clang so this king of errors tends to creep in all the time.
The compilation failure is unfortunate, but there's little we can practically do in clang to help. Relaxing c++ diagnostics in order to be bug-compatible with NVCC is not likely to happen.
This should be fixed upstream in the thrust/cub repos. NVIDIA does accept patches and we did get similar issues fixed in the past. E.g: https://github.com/NVIDIA/thrust/commit/51363575f6d3edffac57d664fc3883cfa0e6d5c7
assigned to @Artem-B
Extended Description
When #include <cub/cub.cuh> or #include <thrust/copy.h> is present in a CUDA source file compiled with clang-12 with CUDA Toolkit 11.4.1, compilation errors appear as in https://stackoverflow.com/questions/69064115/compilation-of-cub-library-with-clang-as-the-cuda-compiler and also the following error for
thrust
:In file included from /usr/local/cuda-11.4/include/thrust/copy.h:512: In file included from /usr/local/cuda-11.4/include/thrust/detail/copy_if.h:74: In file included from /usr/local/cuda-11.4/include/thrust/detail/copy_if.inl:20: In file included from /usr/local/cuda-11.4/include/thrust/system/detail/generic/copy_if.h:63: In file included from /usr/local/cuda-11.4/include/thrust/system/detail/generic/copy_if.inl:31: In file included from /usr/local/cuda-11.4/include/thrust/scan.h:1563: In file included from /usr/local/cuda-11.4/include/thrust/detail/scan.inl:29: In file included from /usr/local/cuda-11.4/include/thrust/system/detail/adl/scan_by_key.h:42: In file included from /usr/local/cuda-11.4/include/thrust/system/cuda/detail/scan_by_key.h:34: In file included from /usr/local/cuda-11.4/include/thrust/system/cuda/execution_policy.h:51: In file included from /usr/local/cuda-11.4/include/thrust/system/cuda/detail/partition.h:42: In file included from /usr/local/cuda-11.4/include/thrust/partition.h:1438: In file included from /usr/local/cuda-11.4/include/thrust/detail/partition.inl:26: In file included from /usr/local/cuda-11.4/include/thrust/system/detail/generic/partition.h:169: In file included from /usr/local/cuda-11.4/include/thrust/system/detail/generic/partition.inl:26: In file included from /usr/local/cuda-11.4/include/thrust/sort.h:1361: In file included from /usr/local/cuda-11.4/include/thrust/detail/sort.inl:27: In file included from /usr/local/cuda-11.4/include/thrust/system/detail/adl/sort.h:42: In file included from /usr/local/cuda-11.4/include/thrust/system/cuda/detail/sort.h:36: In file included from /usr/local/cuda-11.4/include/cub/device/device_radix_sort.cuh:40: In file included from /usr/local/cuda-11.4/include/cub/device/dispatch/dispatch_radix_sort.cuh:40: In file included from /usr/local/cuda-11.4/include/cub/device/dispatch/../../agent/agent_radix_sort_histogram.cuh:38: /usr/local/cuda-11.4/include/cub/block/radix_rank_sort_operations.cuh:124:20: error: explicit qualification required to use member 'ProcessFloatMinusZero' from dependent base class return BFE(ProcessFloatMinusZero(key), bit_start, num_bits); ^ /usr/local/cuda-11.4/include/cub/block/block_radix_rank.cuh:413:50: note: in instantiation of member function 'cub::BFEDigitExtractor::Digit' requested here
unsigned int digit = digit_extractor.Digit(keys[ITEM]);
^
/usr/local/cuda-11.4/include/cub/device/dispatch/../../block/block_radix_sort.cuh:228:72: note: in instantiation of function template specialization 'cub::BlockRadixRank<256, 6, false, true, cub::BLOCK_SCAN_WARP_SCANS, cudaSharedMemBankSizeFourByte, 1, 1, 750>::RankKeys<unsigned int, 19, cub::BFEDigitExtractor>' requested here
AscendingBlockRadixRank(temp_storage.asending_ranking_storage).RankKeys(
^
/usr/local/cuda-11.4/include/cub/device/dispatch/../../block/block_radix_sort.cuh:367:13: note: in instantiation of member function 'cub::BlockRadixSort<unsigned int, 256, 19, unsigned int, 6, true, cub::BLOCK_SCAN_WARP_SCANS, cudaSharedMemBankSizeFourByte, 1, 1, 750>::RankKeys' requested here
RankKeys(unsigned_keys, ranks, digit_extractor, is_descending);
^
/usr/local/cuda-11.4/include/cub/device/dispatch/dispatch_radix_sort.cuh:327:40: note: in instantiation of function template specialization 'cub::BlockRadixSort<unsigned int, 256, 19, unsigned int, 6, true, cub::BLOCK_SCAN_WARP_SCANS, cudaSharedMemBankSizeFourByte, 1, 1, 750>::SortBlockedToStriped<0, 0>' requested here
BlockRadixSortT(temp_storage.sort).SortBlockedToStriped(
^
/usr/local/cuda-11.4/include/cub/device/dispatch/dispatch_radix_sort.cuh:1562:17: note: in instantiation of function template specialization 'cub::DeviceRadixSortSingleTileKernel<cub::DeviceRadixSortPolicy<unsigned int, unsigned int, int>::Policy800, false, unsigned int, unsigned int, int>' requested here
DeviceRadixSortSingleTileKernel<MaxPolicyT, IS_DESCENDING, KeyT, ValueT, OffsetT>);
^
/usr/local/cuda-11.4/include/cub/device/dispatch/../../util_device.cuh:706:28: note: (skipping 12 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
return op.template Invoke();
^
/usr/local/cuda-11.4/include/thrust/system/cuda/detail/sort.h:1672:19: note: in instantiation of function template specialization 'thrust::cuda_cub::__smart_sort::smart_sort<thrust::detail::integral_constant<bool, true>, thrust::detail::integral_constant<bool, false>, thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr>, thrust::detail::normal_iterator<thrust::device_ptr>, thrust::less>' requested here
smart_sort::smart_sort<thrust::detail::true_type, thrust::detail::false_type>(
^
/usr/local/cuda-11.4/include/thrust/system/cuda/detail/sort.h:1740:13: note: in instantiation of function template specialization 'thrust::cuda_cub::sort_by_key<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr>, thrust::detail::normal_iterator<thrust::device_ptr>, thrust::less>' requested here
cuda_cub::sort_by_key(policy, keys_first, keys_last, values, less());
^
/usr/local/cuda-11.4/include/thrust/detail/sort.inl:98:10: note: in instantiation of function template specialization 'thrust::cuda_cub::sort_by_key<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr>, thrust::detail::normal_iterator<thrust::device_ptr>>' requested here
return sort_by_key(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), keys_first, keys_last, values_first);
^
/usr/local/cuda-11.4/include/thrust/detail/sort.inl:285:18: note: in instantiation of function template specialization 'thrust::sort_by_key<thrust::cuda_cub::tag, thrust::detail::normal_iterator<thrust::device_ptr>, thrust::detail::normal_iterator<thrust::device_ptr>>' requested here
return thrust::sort_by_key(select_system(system1,system2), keys_first, keys_last, values_first);
^
some_source.cu:2659:3: note: in instantiation of function template specialization 'thrust::sort_by_key<thrust::detail::normal_iterator<thrust::device_ptr>, thrust::detail::normal_iterator<thrust::device_ptr>>' requested here
sort_by_key(key.begin(), key.begin() + size, val.begin());
^
/usr/local/cuda-11.4/include/cub/block/radix_rank_sort_operations.cuh:98:52: note: member is declared here
static device forceinline__ UnsignedBits ProcessFloatMinusZero(UnsignedBits key)