Open dllehr81 opened 7 years ago
Before you build try export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
Hey @csarofeen . That did the trick! On a side note. This has the appearance of disabling the half operators in the cuda code. Will this impact the half variables performance when run on the device?
It will, for the better.
@csarofeen this did it for me as well, thank you!
I had the same issue:
[ 4%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCSleep.cu.o
[ 5%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCStorage.cu.o
[ 6%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCStorageCopy.cu.o
[ 7%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensor.cu.o
[ 8%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o
[ 10%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o
[ 11%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath2.cu.o
[ 12%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathBlas.cu.o
[ 13%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathMagma.cu.o
[ 14%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o
/home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(393): error: more than one operator "==" matches these operands:
function "operator==(const __half &, const __half &)"
function "operator==(half, half)"
operand types are: half == half
/home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(414): error: more than one operator "==" matches these operands:
function "operator==(const __half &, const __half &)"
function "operator==(half, half)"
operand types are: half == half
[ 15%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathReduce.cu.o
2 errors detected in the compilation of "/tmp/tmpxft_00002141_00000000-4_THCTensorMath.cpp4.ii".
CMake Error at THC_generated_THCTensorMath.cu.o.cmake:267 (message):
Error generating file
/home/ubuntu/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorMath.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:112: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
^Clib/THC/CMakeFiles/THC.dir/build.make:105: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o] Interrupt
lib/THC/CMakeFiles/THC.dir/build.make:140: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o] Interrupt
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Interrupt
Makefile:127: recipe for target 'all' failed
make: *** [all] Interrupt
Error: Build error: Failed building.
ubuntu@ip-Address:~/torch$
Running ./clean.sh
and then using: export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
, before finally running ./install.sh
worked!
I was using Ubuntu 16.04.
@csarofeen If it's better to disable the half operators, then what are they used for? Why are they included in the cuda code? And what kind of performance boost are we talking about here?
Cuda 9 added half operators in the cuda half header. Half operations in torch predate that so they already existed in torch. This keeps the half definition from the cuda header, while not compiling the operators.
@csarofeen Do you have any other performance tips for Cuda and/or cuDNN with Torch7?
Because I've noticed that Cuda 9.0 and cuDNN v7 have even worse performance than Cuda 8.0 and cuDNN v5: https://github.com/jcjohnson/neural-style/issues/429
same issue. but export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" didn't work for me?? how to solve that?
@sfzyk Could you please explain all steps you took to install CUDA, NCCL, cuDNN, and pytorch and paste here some of the output from the error? It is very hard to assist the only information provided is "didn't work".
Install Torch 7 in Ubuntu 16.04 cause error: cuda 9.0: more than one operator "==" matches these operands" One possible solution: 1、uninstall cuda9.0 ref:http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-additional
(1)To uninstall the CUDA Toolkit, run the uninstallation script provided in the bin directory of the toolkit. By default, it is located in /usr/local/cuda-9.1/bin: $ sudo /usr/local/cuda-9.1/bin/uninstall_cuda_9.1.pl (2)To uninstall the NVIDIA Driver, run nvidia-uninstall(no need to uninstall): $ sudo /usr/bin/nvidia-uninstall (3)reboot ubuntu
2、 install cuda 8.0 - download address:https://developer.nvidia.com/cuda-80-ga2-download-archive
(1)install 8.0 deb sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb sudo apt-get update sudo apt-get install cuda
(2)install patch2 sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_amd64.deb sudo apt-get update sudo apt-get install cuda
3、install Torch git clone https://github.com/torch/distro.git ~/torch --recursive cd ~/torch bash install-deps
if earlier error caused, use:sudo ./clean.sh sudo ./install.sh export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
source ~/.bashrc
@csarofeen I tried export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
before ./install.sh
and I still cannot install Torch. The installation gets stalled at 81% while compiling the cutorch
package. Before setting the environmental variable, the installation would crash. I have CUDA 9.1
and cuDNN 7.05
on a machine with GeForce 1080Ti
GPU.
Now, I'm getting warnings like this:
[ 61%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMaskedLong.cu.o
[ 62%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorSortHalf.cu.o
/home/arsalans/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=half, AccT=float]"
/home/arsalans/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/home/arsalans/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void renormRowsL1(T *, long, long) [with T=float]"
/home/arsalans/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/home/arsalans/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=float, AccT=float]"
/home/arsalans/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/home/arsalans/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): warning: specified alignment (8) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void renormRowsL1(T *, long, long) [with T=double]"
/home/arsalans/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/home/arsalans/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (8) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=double, AccT=double]"
/home/arsalans/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
[ 63%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMathCompareTHalf.cu.o
[ 64%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/generated/./THC_generated_THCTensorMathPointwiseHalf.cu.o
I had the same problem and it was driving me nuts.
This did not work:
./clean.sh
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
./install.sh
This did work:
./clean.sh
TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ./install.sh
Hope that helps.
@thompa2 It works like a charm
With 9.2 you need.
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF2_OPERATORS__"
@ricpruss I did export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF2_OPERATORS__"
but cannot build Torch still with CUDA 9.2 and cudnn 7.x.x . Any ideas?
You still getting the errors on operator overload? Did you run clean.sh after the change? @Amir-Arsalan
Same here with cuda 9.0 and cudnn 7. export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
does solve the operator issue, but I am still getting these erros for the __half class:
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(173): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(173): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(177): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(177): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCNumerics.cuh(114): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCNumerics.cuh(115): error: class "__half" has no member "x"
6 errors detected in the compilation of "/tmp/tmpxft_0000615c_00000000-6_THCTensorCopy.cpp1.ii".
CMake Error at THC_generated_THCTensorCopy.cu.o.cmake:267 (message):
Error generating file
/home/max/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorCopy.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:105: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(173): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(173): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(177): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCTensorTypeUtils.cuh(177): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCNumerics.cuh(114): error: class "__half" has no member "x"
/home/max/torch/extra/cutorch/lib/THC/THCNumerics.cuh(115): error: class "__half" has no member "x"
6 errors detected in the compilation of "/tmp/tmpxft_00006176_00000000-6_THCTensorMath.cpp1.ii".
CMake Error at THC_generated_THCTensorMath.cu.o.cmake:267 (message):
Error generating file
/home/max/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorMath.cu.o
lib/THC/CMakeFiles/THC.dir/build.make:112: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o] Error 1
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
Error: Build error: Failed building.
@ricpruss I get these errors:
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void renormRowsL1(T *, long, long) [with T=float]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=float, AccT=float]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): warning: specified alignment (8) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void renormRowsL1(T *, long, long) [with T=double]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (8) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=double, AccT=double]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=half, AccT=float]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void renormRowsL1(T *, long, long) [with T=float]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (4) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=float, AccT=float]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): warning: specified alignment (8) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void renormRowsL1(T *, long, long) [with T=double]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): warning: specified alignment (8) is different from alignment (2) specified on a previous declaration
detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long *, long, int, T *, T *) [with T=double, AccT=double]"
/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
Error: Build error: Failed building.
@thompa2 AMAZING! Thank you! After hours of searching for a solution this worked. (well I'm at 20% now which is more than I've been able to get to until now) What a pain!
I lie, failed again........ but this time at 20% . That's progress isn't it? I'm not sure, I'm thinking for giving up.
I've got MacBook Pro (13-inch, 2017)
This is the error message at 20%:
^
9 warnings generated. [ 20%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorRandom.cu.o /Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): error: specified alignment (4) is different from alignment (2) specified on a previous declaration detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long , long, int, T , T *) [with T=half, AccT=float]" /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): error: specified alignment (4) is different from alignment (2) specified on a previous declaration detected during instantiation of "void renormRowsL1(T *, long, long) [with T=float]" /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): error: specified alignment (4) is different from alignment (2) specified on a previous declaration detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long , long, int, T , T *) [with T=float, AccT=float]" /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
/Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): error: specified alignment (8) is different from alignment (2) specified on a previous declaration detected during instantiation of "void renormRowsL1(T *, long, long) [with T=double]" /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(98): here
/Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): error: specified alignment (8) is different from alignment (2) specified on a previous declaration detected during instantiation of "void sampleMultinomialOnce<T,AccT>(long , long, int, T , T *) [with T=double, AccT=double]" /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorRandom.cu(169): here
5 errors detected in the compilation of "/tmp/tmpxft_00011634_00000000-11_THCTensorRandom.compute_61.cpp1.ii". CMake Error at THC_generated_THCTensorRandom.cu.o.cmake:267 (message): Error generating file /Users/fredlemieux/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorRandom.cu.o
make[2]: [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorRandom.cu.o] Error 1 make[2]: Waiting for unfinished jobs.... 9 warnings generated. 6 warnings generated. /Users/fredlemieux/torch/extra/cutorch/lib/THC/THCHalf.h:24:17: warning: 'THC_float2half' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THC_float2half(float a); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCStorage.h:28:17: warning: 'THCudaHalfStorage_get' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfStorage_get(THCState state, const THCudaHalfStorage , ptrdiff_t); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensor.h:127:17: warning: 'THCudaHalfTensor_get1d' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfTensor_get1d(THCState state, const THCudaHalfTensor tensor, long x0); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensor.h:128:17: warning: 'THCudaHalfTensor_get2d' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfTensor_get2d(THCState state, const THCudaHalfTensor tensor, long x0, long x1); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensor.h:129:17: warning: 'THCudaHalfTensor_get3d' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfTensor_get3d(THCState state, const THCudaHalfTensor tensor, long x0, long x1, long x2); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensor.h:130:17: warning: 'THCudaHalfTensor_get4d' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfTensor_get4d(THCState state, const THCudaHalfTensor tensor, long x0, long x1, long x2, long x3); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorMathReduce.h:35:17: warning: 'THCudaHalfTensor_minall' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfTensor_minall(THCState state, THCudaHalfTensor self); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorMathReduce.h:36:17: warning: 'THCudaHalfTensor_maxall' has C-linkage specified, but returns user-defined type 'half' (aka 'half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfTensor_maxall(THCState state, THCudaHalfTensor self); ^ /Users/fredlemieux/torch/extra/cutorch/lib/THC/generic/THCTensorMathReduce.h:37:17: warning: 'THCudaHalfTensor_medianall' has C-linkage specified, but returns user-defined type 'half' (aka '__half') which is incompatible with C [-Wreturn-type-c-linkage] extern "C" half THCudaHalfTensor_medianall(THCState state, THCudaHalfTensor self); ^ 9 warnings generated. make[1]: [lib/THC/CMakeFiles/THC.dir/all] Error 2 make: [all] Error 2
Error: Build error: Failed building.
Same error here OSX 10.13.5, Cuda 9.1, cudnn 7
Before you build try
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
It works on Ubuntu 18.04 too.
@ricpruss I know this has been a while but for some reason I need to compile Torch with CUDA 9.2. I remember I tried export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF2_OPERATORS__"
and I could not compile Torch. I just added both -D__CUDA_NO_HALF2_OPERATORS__
and -D__CUDA_NO_HALF_OPERATORS__
to TORCH_NVCC_FLAGS
and could compile Torch with CUDA 9.2 but when I do require 'cutorch'
I get the following errors:
require 'cutorch';
THCudaCheck FAIL file=/torch/extra/cutorch/lib/THC/THCGeneral.c line=70 error=35 : CUDA driver version is insufficient for CUDA runtime version
/torch/install/share/lua/5.1/trepl/init.lua:389: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /torch/extra/cutorch/lib/THC/THCGeneral.c:70
stack traceback:
[C]: in function 'error'
/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
[string "require 'cutorch';"]:1: in main chunk
[C]: in function 'xpcall'
/torch/install/share/lua/5.1/trepl/init.lua:679: in function 'repl'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
[C]: at 0x00405d50
How did you resolve this?
I get the same issue on Windows10, pytorch1.1.0, vs 2017 with version 15.4 toolset. Anyone have the good method?
E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(190): error: more than one operator "<" matches these operands: built-in operator "arithmetic < arithmetic" function "operator<(const half &, const half &)" operand types are: c10::Half < c10::Half
E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(191): error: more than one operator "<=" matches these operands: built-in operator "arithmetic <= arithmetic" function "operator<=(const half &, const half &)" operand types are: c10::Half <= c10::Half
E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(192): error: more than one operator ">" matches these operands: built-in operator "arithmetic > arithmetic" function "operator>(const half &, const half &)" operand types are: c10::Half > c10::Half
E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(193): error: more than one operator ">=" matches these operands: built-in operator "arithmetic >= arithmetic" function "operator>=(const half &, const half &)" operand types are: c10::Half >= c10::Half
E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(194): error: more than one operator "==" matches these operands: built-in operator "arithmetic == arithmetic" function "operator==(const half &, const half &)" operand types are: c10::Half == c10::Half
E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(196): error: more than one operator "!=" matches these operands: built-in operator "arithmetic != arithmetic" function "operator!=(const half &, const half &)" operand types are: c10::Half != c10::Half
l got same error on ubuntu 18.04 cudnn 10.1 cuda 7.5 tried all methods above and no process but the command below helps me to run th successfully luarocks install cutorch. without any "export ..."
@tjusxh Me too. Have you solved it?
@csarofeen you earlier said that the performance will improve for the better by disabling the CUDA builtin operators. Can you explain how?
It will, for the better.
This didn't work for me ./clean.sh export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ./install.sh
try this
sudo TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ./install.sh
the my project work
Windows user have to use:
SET NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
instead of
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
This comes from PyTorch CMake files:
if(CUDA_HAS_FP16 OR NOT ${CUDA_VERSION} LESS 7.5)
message(STATUS "Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor")
list(APPEND CUDA_NVCC_FLAGS "-DCUDA_HAS_FP16=1" "-D__CUDA_NO_HALF_OPERATORS__" "-D__CUDA_NO_HALF_CONVERSIONS__"
"-D__CUDA_NO_BFLOAT16_CONVERSIONS__" "-D__CUDA_NO_HALF2_OPERATORS__")
add_compile_options(-DCUDA_HAS_FP16=1)
......
That is why normal pytorch build won't get this error.
On Nvidia NGC Docker, targets several GPUs. For deepspeed, we have to setup arch_list up to Volta architecture. So I build like this and works.
TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0" DS_BUILD_OPS=1 DS_BUILD_FUSED_LAMB=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_SPARSE_ATTN=1 DS_BUILD_TRANSFORMER=1 DS_BUILD_UTILS=1 python3 setup.py install
While attempting to build torch from master with cutorch with cuda 9.0.103-1 on Ubuntu 16.04 I hit an error with multiple attempts to overload the "==" and "!=" operators.
Below is an example of the error I receive.
I was able to track down the two operator overloads.
One is in https://github.com/torch/cutorch/blob/master/lib/THC/THCTensorTypeUtils.cuh#L176
And the other is in
/usr/local/cuda-9.0/targets/ppc64le-linux/include/cuda_fp16.hpp
The operator in
cuda_fp16.hpp
was provided by the cuda package, but only covers the__device__
and not the__host__
. So we still need to overload the "==" for halfs in the__host__
, however, the code currently in cutorch fails on compile time.It looks like @csarofeen worked on the initial port to cuda9.0 for cutorch. I'm not sure if he can provide some help on what's going on here?
Is there any additional information you need from me? Thanks in advance!!