Open eigengrau opened 5 years ago
I had the same problem when I built against CUDA10.1, I was able to compile using your patch, but would it cause other runtime problems? That was what I really worried about.
ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfmLin1EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfmLi2EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfmLi1EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfmLin2EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfjLin1EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfjLi2EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfjLi1EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _Z24THNN_CudaHalfLSTMForwardI6halfjLin2EEv10TensorInfoIT_T0_ES4_S4_S4_S4_S4_S4_S3S3 is out of range. .minnctapersm will be ignored [ 26%] Building NVCC (Device) object lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_LookupTableBag.cu.o /tmp/luarocks_cunn-scm-1-5240/cunn/lib/THCUNN/LookupTable.cu(32): error: identifier "__shfl" is undefined
/tmp/luarocks_cunn-scm-1-5240/cunn/lib/THCUNN/LookupTable.cu(49): warning: function "any" /usr/local/cuda/include/device_atomic_functions.h(178): here was declared deprecated ("any() is not valid on compute_70 and above, and should be replaced with __any_sync().To continue using __any(), specify virtual architecture compute_60 when targeting sm_70 and above, for example, using the pair of compiler options: -arch=compute_60 -code=sm_70.")
[ 28%] Building NVCC (Device) object lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_MSECriterion.cu.o [ 29%] Building NVCC (Device) object lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_MarginCriterion.cu.o 1 error detected in the compilation of "/tmp/tmpxft_00002ab6_00000000-6_LookupTable.cpp1.ii". CMake Error at THCUNN_generated_LookupTable.cu.o.Release.cmake:279 (message): Error generating file /tmp/luarocks_cunn-scm-1-5240/cunn/build/lib/THCUNN/CMakeFiles/THCUNN.dir//./THCUNN_generated_LookupTable.cu.o
lib/THCUNN/CMakeFiles/THCUNN.dir/build.make:175: recipe for target 'lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_LookupTable.cu.o' failed make[2]: *** [lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_LookupTable.cu.o] Error 1
It looks like CUDA9 deprecates
__shfl
and__any
. I was able to compile using the following quick&dirty patch: