stotko / stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU
https://stotko.github.io/stdgpu/
Apache License 2.0
1.11k stars 80 forks source link

Building failed when Windows 11 + CUDA 12.5 + MSVC 19.41 + CMake 3.29.4 + "STDGPU_BACKEND=STDGPU_BACKEND_CUDA" #417

Open dengchenlong opened 3 weeks ago

dengchenlong commented 3 weeks ago

Describe the bug

Building VS project failed when the backend is CUDA 12.5.

Steps to reproduce

  1. Prerequisites:
    1. Windows 11,
    2. CUDA 12.5,
    3. MSVC 19.41 (VS 2022 Preview),
    4. CMake 3.29.4,
    5. Download stdgpu source.
  2. Configure CMake cache and generate VS project files using CMake GUI with STDGPU_BACKEND equaling to STDGPU_BACKEND_CUDA.
  3. Open VS solution by VS 2022 Preview. image
  4. Build project stdgpu.

Expected behavior

Building succeed.

Actual behavior

Building failed.

CMake configuration output:

Selecting Windows SDK version 10.0.20348.0 to target Windows 10.0.22610.
Created device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Xcompiler=/W2>
Created test device flags : $<$<COMPILE_LANGUAGE:CUDA>:-Wno-deprecated-declarations>
Detected user-provided CCs : 52
Created host flags : $<$<COMPILE_LANGUAGE:CXX>:/W2>
Created test host flags : $<$<COMPILE_LANGUAGE:CXX>:/wd4996>
Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) (Required is at least version "1.9.1")
CMake Deprecation Warning at test/googletest-1.11.0/CMakeLists.txt:4 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

CMake Deprecation Warning at test/googletest-1.11.0/googletest/CMakeLists.txt:56 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

************************ stdgpu Configuration Summary *************************

General:
  Version                                   :   1.3.0
  System                                    :   Windows
  Build type                                :   

Build:
  STDGPU_BACKEND                            :   STDGPU_BACKEND_CUDA
  STDGPU_BUILD_SHARED_LIBS                  :   OFF
  STDGPU_SETUP_COMPILER_FLAGS               :   ON
  STDGPU_TREAT_WARNINGS_AS_ERRORS           :   OFF
  STDGPU_ANALYZE_WITH_CLANG_TIDY            :   OFF
  STDGPU_ANALYZE_WITH_CPPCHECK              :   OFF

Configuration:
  STDGPU_ENABLE_CONTRACT_CHECKS             :   ON
  STDGPU_USE_32_BIT_INDEX                   :   ON

Examples:
  STDGPU_BUILD_EXAMPLES                     :   ON

Tests:
  STDGPU_BUILD_TESTS                        :   ON
  STDGPU_BUILD_TEST_COVERAGE                :   OFF

Documentation:
  Doxygen                                   :   NO

*******************************************************************************

Configuring done (4.1s)

VS building output:

生成开始于 21:00...
1>------ 已启动生成: 项目: ZERO_CHECK, 配置: Debug x64 ------
1>1>Checking Build System
2>------ 已启动生成: 项目: stdgpu, 配置: Debug x64 ------
2>Building Custom Rule E:/Repos/open3d/build/stdgpu/src/ext_stdgpu/src/stdgpu/CMakeLists.txt
2>iterator.cpp
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(90,29): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(101,29): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(115,55): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(130,55): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(208,40): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(218,49): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(252,37): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(263,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(271,5): error C3861: “__syncthreads”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(280,12): error C3861: “__syncthreads_and”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(289,12): error C3861: “__syncthreads_or”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(298,5): error C3861: “__syncwarp”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(307,12): error C3861: “__any_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(316,12): error C3861: “__all_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(325,12): error C3861: “__ballot_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(335,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(346,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(357,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(368,12): error C3861: “__shfl_sync”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(377,35): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(388,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(398,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(406,9): error C2059: 语法错误:“volatile”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(415,39): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(416,40): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(417,13): error C2065: “threadIdx”: 未声明的标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(427,34): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(438,34): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(479,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(489,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(499,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub\util_ptx.cuh(509,39): error C2059: 语法错误:“:”
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cuda\std\detail\libcxx\include\__cuda\ptx\ptx_helper_functions.h(40,44): error C3861: “__cvta_generic_to_shared”: 找不到标识符
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cuda\std\detail\libcxx\include\__cuda\ptx\ptx_helper_functions.h(60,44): error C3861: “__cvta_generic_to_global”: 找不到标识符
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,26): error C3856: “is_proxy_reference”: 符号不是 模板 类
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,70): error C2065: “Container”: 未声明的标识符
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,43): error C2923: "stdgpu::detail::back_insert_iterator_proxy": "Container" 不是参数 "Container" 的有效 模板 类型参数
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,43): error C2143: 语法错误: 缺少“;”(在“stdgpu::detail::back_insert_iterator_proxy”的前面)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(450,79): error C2059: 语法错误:“>”
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,7): error C2059: 语法错误:“public”
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,22): error C2872: “detail”: 不明确的符号
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(451,30): error C2039: "true_type": 不是 "thrust::detail" 的成员
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(452,1): error C2143: 语法错误: 缺少“;”(在“{”的前面)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(452,1): error C2447: “{”: 缺少函数标题(是否是老式的形式表?)
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(459,22): error C2872: “detail”: 不明确的符号
2>E:\Repos\open3d\build\stdgpu\src\ext_stdgpu\src\stdgpu\impl\iterator_detail.h(467,22): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,38): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,46): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,67): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(101,67): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,40): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,48): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,69): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\for_each.h(165,69): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,40): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,48): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,69): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(41,69): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,42): error C2872: “detail”: 不明确的符号
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,50): error C2039: "execution_policy_base": 不是 "thrust::detail" 的成员
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,71): error C2988: 不可识别的模板声明/定义
2>D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\thrust\detail\for_each.inl(68,71): error C2143: 语法错误: 缺少“,”(在“<”的前面)
2>limits.cpp
2>正在生成代码...
2>已完成生成项目“stdgpu.vcxproj”的操作 - 失败。
========== 生成: 1 成功,1 失败,0 最新,0 已跳过 ==========
========== 生成 于 21:00 完成,耗时 01.772 秒 ==========

System:

dengchenlong commented 3 weeks ago

As long as STDGPU_BACKEND=STDGPU_BACKEND_OPENMP, everything can work normally.

stotko commented 3 weeks ago

I can reproduce these compilation errors on Ubuntu 22.04 + CUDA 12.5 + latest commit from master branch. Furthermore, only the CUDA backend seems to be affected and, more precisely, I suspect that the problem might be locally somewhere in thrust since several CUDA-only expressions coming from there are incorrectly used during the compilation of a .cpp file (in your case iterator.cpp).

A very similar error in Open3D has also been reported but within a different part of it: https://github.com/isl-org/Open3D/issues/6813

xzhao99 commented 2 days ago

I saw the same issue when I built Open3D when turning -DBUILD_CUDA_MODULE=ON on Ubuntu 22.04 + CUDA 12.5 with the Open3D/stdgpu cmake setup: GIT_REPOSITORY https://github.com/stotko/stdgpu.git GIT_TAG master


In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/barrier_cluster.h:30, from /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx.h:74, from /usr/local/cuda/include/cuda/ptx:19, from /usr/local/cuda/include/cuda/discard_memory:25, from /usr/local/cuda/include/cub/util_device.cuh:57, from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:48, from /usr/local/cuda/include/thrust/system/cuda/detail/malloc_and_free.h:34, from /usr/local/cuda/include/thrust/system/detail/adl/malloc_and_free.h:50, from /usr/local/cuda/include/thrust/system/detail/generic/memory.inl:30, from /usr/local/cuda/include/thrust/system/detail/generic/memory.h:77, from /usr/local/cuda/include/thrust/detail/reference.h:36, from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/../stdgpu/iterator.h:29, from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/impl/iterator.cpp:16: /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘uint32_t cuda::ptx::4::as_ptr_smem(const void*)’: /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h:40:44: error: ‘cvta_generic_to_shared’ was not declared in this scope 40 | return static_cast<_CUDA_VSTD::uint32_t>(__cvta_generic_to_shared(ptr)); | ^~~~~~~~ /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘uint64_t cuda::ptx::4::as_ptr_gmem(const void*)’: /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h:60:44: error: ‘cvta_generic_to_global’ was not declared in this scope 60 | return static_cast<_CUDA_VSTD::uint64_t>(cvta_generic_to_global(ptr)); | ^~~~~~~~ /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘_Tp cuda::ptx::4::from_ptr_smem(size_t)’: /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h:73:33: error: there are no arguments to ‘cvta_shared_to_generic’ that depend on a template parameter, so a declaration of ‘__cvta_shared_to_generic’ must be available [-fpermissive] 73 | return reinterpret_cast<_Tp>(cvta_shared_to_generic(ptr)); | ^~~~~~~~ /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h:73:33: note: (if you use ‘-fpermissiv’, G++ will accept your code, but allowing the use of an undeclared name is deprecated) /usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/ptx/instructions/../ptx_helper_functions.h: In function ‘_Tp* cuda::ptx::4::from_ptr_gmem(size_t)’: /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/ptx/instructions/../ptx_helper_functions.h:94:33: error: there are no arguments to ‘cvta_global_to_generic’ that depend on a template parameter, so a declaration of ‘cvta_global_to_generic’ must be available [-fpermissive] 94 | return reinterpret_cast<_Tp>(cvta_global_to_generic(__ptr)); | ^~~~~~~~ In file included from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:48, from /usr/local/cuda/include/thrust/system/cuda/detail/malloc_and_free.h:34, from /usr/local/cuda/include/thrust/system/detail/adl/malloc_and_free.h:50, from /usr/local/cuda/include/thrust/system/detail/generic/memory.inl:30, from /usr/local/cuda/include/thrust/system/detail/generic/memory.h:77, from /usr/local/cuda/include/thrust/detail/reference.h:36, from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/../stdgpu/iterator.h:29, from /home/xzhao/workdir/Open3D/build_debug/stdgpu/src/ext_stdgpu/src/stdgpu/impl/iterator.cpp:16: /usr/local/cuda/include/cub/util_device.cuh: In static member function ‘static typename AgentT::TempStorage& cub::CUB200400CUDA_ARCH_LIST_NS::detail::vsmem_helper_impl::get_temp_storage(cub::CUB_200400___CUDA_ARCHLISTNS::NullType&, cub::CUB_200400_CUDA_ARCHLISTNS::detail::vsmem_t&)’: /usr/local/cuda/include/cub/util_device.cuh:160:63: error: ‘blockIdx’ was not declared in this scope 160 | static_cast<char>(vsmem.gmem_ptr) + (vsmem_per_block blockIdx.x)); | ^~~~ /usr/local/cuda/include/cub/util_device.cuh: In static member function ‘static bool cub::CUB_200400_CUDA_ARCHLISTNS::detail::vsmem_helper_impl::discard_temp_storage(typename AgentT::TempStorage&)’: /usr/local/cuda/include/cub/util_device.cuh:201:38: error: ‘threadIdx’ was not declared in this scope 201 | const std::size_t linear_tid = threadIdx.x; | ^~~~~ /usr/local/cuda/include/cub/util_device.cuh:202:50: error: ‘blockDim’ was not declared in this scope 202 | const std::size_t block_stride = line_size blockDim.x; | ^~~~