Open xkszltl opened 5 years ago
FYI I also got a similar error in https://github.com/Microsoft/onnxruntime/issues/663 That one seems to be caused by using TensorRT without setting the language (CUDA) properly.
Here's the interesting part.
Even though I've specified -DTORCH_CUDA_ARCH_LIST=Pascal;Volta
, it still says GPU_ARCH is not defined
(not sure who prints this message).
-- The CUDA compiler identification is NVIDIA 10.1.105
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Protobuf: /usr/local/lib64/libprotobuf.so;-pthread (found version "3.7.0")
-- GPU_ARCH is not defined. Generating CUDA code for default SMs.
-- Found TensorRT headers at /usr/include
-- Find TensorRT libs at /usr/lib64/libnvinfer.so;/usr/lib64/libnvinfer_plugin.so
-- Found TENSORRT: /usr/include
-- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
Switching off TRT will bypass this issue, and the GPU_ARCH message also gone.
The new changes in cmake 3.10+
may have caused some subtle incompatibility.
Downgrading to cmake 3.9.4
seems to be a walkaround.
Any update on this? Downgrading might work but it seems to be related to the way we link TRT. So it may not be a "regression" of cmake itself?
I'm seeing this issue as well, downgrading to 3.9.4 works due to this bit of code in third_party/onnx-tensorrt/CMakeLists.txt
21 cmake_minimum_required(VERSION 3.2 FATAL_ERROR)
22 # The version of CMake which is not compatible with the old CUDA CMake commands.
23 set(CMAKE_VERSION_THRESHOLD "3.10.0")
24
25 if(${CMAKE_VERSION} VERSION_LESS ${CMAKE_VERSION_THRESHOLD})
26 project(onnx2trt LANGUAGES CXX C)
27 else()
28 project(onnx2trt LANGUAGES CXX C CUDA)
29 endif()
seems like things go bad when we go through the else leg and enable the CUDA language, as a test I changed CMAKE_VERSION_THRESHOLD to 3.15.0 and was able to compile fine with cmake version 3.14 so I agree, it doesn't seem to be a cmake issue, my guess is probably an issue with a mix of old style FindCUDA and the new style add_library/add_executable which supports cuda.
WRT the message "GPU_ARCH is not defined. Generating CUDA code for default SMs.", it comes from a little further down in the file (
124 # If GPU_ARCHS is user-defined, build specifically for specified SM
125 if (DEFINED GPU_ARCHS)
126 message(STATUS "GPU_ARCH defined as ${GPU_ARCHS}. Generating CUDA code for SM ${GPU_ARCHS}")
127 separate_arguments(GPU_ARCHS)
128 # Else list out default SMs to build for.
129 else()
130 message(STATUS "GPU_ARCH is not defined. Generating CUDA code for default SMs.")
131 list(APPEND GPU_ARCHS
132 35
133 53
134 61
135 70
136 )
137 # Add SM 75 for CUDA versions >= 10.0
138 if (NOT ("${CUDA_VERSION}" VERSION_LESS "10.0"))
139 list(APPEND GPU_ARCHS
140 75)
141 endif()
142 endif()
the message reads like an error instead of info, but it seems to do the right thing.
Also having this issue with CMake 3.14.5 Confirm that @rjknight workaround with increasing CMAKE_VERSION_THRESHOLD to 3.15.0 works
confirm the issue with onnx-tensorrt with cmake 3.14.5 is solved with @rjknight solution. but there are other problems with cmake 3.14+
Then there is no compile issue, no conflict with CUDA toolkit 9.x and 10.x and libtorch
@rjknight @mikeseven Thank you. CMAKE_CUDA_COMPILE_WHOLE_COMPILATION issue is NOW solved.
I still have some CUB issue?
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(362): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=InputIteratorT]" as a type in a dependent context
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(363): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=OutputIteratorT]" as a type in a dependent context
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(683): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=InputIteratorT]" as a type in a dependent context
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(684): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=OutputIteratorT]" as a type in a dependent context
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(362): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=InputIteratorT]" as a type in a dependent context
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(363): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=OutputIteratorT]" as a type in a dependent context
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(683): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=InputIteratorT]" as a type in a dependent context
/usr/local/cuda/include/cub/device/dispatch/dispatch_reduce.cuh(684): error: use the "typename" keyword to treat nontype "std::iterator_traits<_Iterator>::value_type [with _Iterator=OutputIteratorT]" as a type in a dependent context
4 errors detected in the compilation of "/tmp/tmpxft_00005fca_00000000-6_reduce.cpp1.ii".
-- Removing ....../pytorch/build/caffe2/CMakeFiles/torch.dir/utils/math/./torch_generated_reduce.cu.o
/usr/bin/cmake -E remove ....../pytorch/build/caffe2/CMakeFiles/torch.dir/utils/math/./torch_generated_reduce.cu.o
CMake Error at torch_generated_reduce.cu.o.Release.cmake:279 (message):
Error generating file
....../pytorch/build/caffe2/CMakeFiles/torch.dir/utils/math/./torch_generated_reduce.cu.o
make[2]: *** [caffe2/CMakeFiles/torch.dir/build.make:134500: caffe2/CMakeFiles/torch.dir/utils/math/torch_generated_reduce.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
4 errors detected in the compilation of "/tmp/tmpxft_00005fd2_00000000-6_math_gpu.cpp1.ii".
-- Removing ....../pytorch/build/caffe2/CMakeFiles/torch.dir/utils/./torch_generated_math_gpu.cu.o
/usr/bin/cmake -E remove ....../pytorch/build/caffe2/CMakeFiles/torch.dir/utils/./torch_generated_math_gpu.cu.o
CMake Error at torch_generated_math_gpu.cu.o.Release.cmake:279 (message):
Error generating file
....../pytorch/build/caffe2/CMakeFiles/torch.dir/utils/./torch_generated_math_gpu.cu.o
make[2]: *** [caffe2/CMakeFiles/torch.dir/build.make:136190: caffe2/CMakeFiles/torch.dir/utils/torch_generated_math_gpu.cu.o] Error 1
make[2]: Leaving directory '....../pytorch/build'
make[1]: *** [CMakeFiles/Makefile2:3175: caffe2/CMakeFiles/torch.dir/all] Error 2
make[1]: Leaving directory '....../pytorch/build'
make: *** [Makefile:166: all] Error 2
➜ build git:(master) ✗ cd ..
My environments are (with a previously installed pytorch):
➜ utils git:(master) ✗ python collect_env.py
Collecting environment information...
PyTorch version: 1.2.0a0+d51bd21
Is debug build: No
CUDA used to build PyTorch: 10.1.168
OS: Ubuntu 19.04
GCC version: (Ubuntu 8.3.0-6ubuntu1) 8.3.0
CMake version: version 3.13.4
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.168
GPU models and configuration: GPU 0: GeForce GTX 980M
Nvidia driver version: 430.40
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.2
Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] torch==1.2.0a0+d51bd21
[pip3] torchfile==0.1.0
[pip3] torchfusion==0.3.6
[pip3] torchfusion-utils==0.1.5
[pip3] torchtext==0.3.1
[pip3] torchvision==0.3.0a0+8a64dbc
[conda] Could not collect
Can you help please?
@jiapei100 This is not related to this issue so please open a new one for help. But my first suggestion is to try gcc-7 instead of 8.
Same issue occured with following environment.
Tried @rjknight workaround with higher version set(CMAKE_VERSION_THRESHOLD "3.24.0")
but no avail. problem was my cmake binary distribution version 3.23.0-rc3
.
Luckily I've downgraded cmake version to 3.22.3
, and It worked without Missing CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
.
Hope it helps.
Tried @rjknight workaround with higher version
set(CMAKE_VERSION_THRESHOLD "3.24.0")
but no avail. problem was my cmake binary distribution version3.23.0-rc3
.Luckily I've downgraded cmake version to
3.22.3
, and It worked withoutMissing CMAKE_CUDA_COMPILE_WHOLE_COMPILATION
.
The above issue was a bug in the 3.23.0 release. Using 3.23.1 or newer will resolve the issue so you don't need to downgrade.
FYI: any cmake that ends in -rc<N>
is a release canditate and shouldn't be used in production as they are only for testing.
Bug
This is probably a recent regression.
When building for CUDA, I got a lot of errors complaining about missing
CMAKE_CUDA_DEVICE_LINK_LIBRARY
andCMAKE_CUDA_COMPILE_WHOLE_COMPILATION
.Here's the build command and log when using cmake + make. cmake + ninja will crash in a more violent way.
Environment
conda
,pip
, source): sourceAdditional context