pytorch / extension-cpp

C++ extensions in PyTorch
1.02k stars 214 forks source link

Compiler interprets fmax in lltm_cuda_kernel.cu __device__ function as std::fmax #14

Closed mmazeika closed 6 years ago

mmazeika commented 6 years ago

I cloned the repository, and the CPU version compiles, but I get the following error when running python setup.py install in the cuda folder.

running install running bdist_egg running egg_info creating lltm_cuda.egg-info writing dependency_links to lltm_cuda.egg-info/dependency_links.txt writing lltm_cuda.egg-info/PKG-INFO writing top-level names to lltm_cuda.egg-info/top_level.txt writing manifest file 'lltm_cuda.egg-info/SOURCES.txt' reading manifest file 'lltm_cuda.egg-info/SOURCES.txt' writing manifest file 'lltm_cuda.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'lltm_cuda' extension creating build creating build/temp.linux-x86_64-3.5 gcc -pthread -B /home/mantas/anaconda3/envs/pytorch04/compiler_compat -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/TH -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.0/include -I/home/mantas/anaconda3/envs/pytorch04/include/python3.5m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.5/lltm_cuda.o -DTORCH_EXTENSION_NAME=lltm_cuda -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ /usr/local/cuda-9.0/bin/nvcc -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/TH -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.0/include -I/home/mantas/anaconda3/envs/pytorch04/include/python3.5m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.5/lltm_cuda_kernel.o -DTORCH_EXTENSION_NAME=lltm_cuda --compiler-options '-fPIC' -std=c++11 lltm_cuda_kernel.cu(54): error: calling a host function("std::fmax<double, float> ") from a global function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel ") is not allowed

lltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code

2 errors detected in the compilation of "/tmp/tmpxft_00002819_00000000-6_lltm_cuda_kernel.cpp1.ii". error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1

I'm using PyTorch 0.4.0 installed via conda a few weeks ago, Python 3.5, CUDA 9.0, cuDNN 7.1.4, and GCC 6.4.0.

goldsborough commented 6 years ago

Hmm that's interesting that I didn't notice this. Could you do me a favor and see if it goes a way if you change fmax to ::fmax so that it uses the global CUDA version

mmazeika commented 6 years ago

I changed return fmax(0.0, z) + fmin(0.0, alpha * (exp(z) - 1.0)); in the elu function to return ::fmax(0.0, z) + fmin(0.0, alpha * (exp(z) - 1.0));, but the error function didn't change. It didn't start complaining about the fmin.

I tried changing line 54 in the original code from candidate_cell[index] = elu(gates[gates_row + 2 * state_size + column]); to candidate_cell[index] = sigmoid(gates[gates_row + 2 * state_size + column]); so as to avoid calling fmax and fmin, and I got pages of errors as a result. Two errors in the printout that repeat several times are error: wrong number of template arguments (5, should be 2) return __and_<__not_<is_same<tuple<_Elements...> and error: mismatched argument pack lengths while expanding ‘std::is_constructible<_Elements, _UElements&&>’ return __and_<is_constructible<_Elements, _UElements&&>...>::value;.

I've attached the printout in a text file to avoid clutter. torch.cuda.is_available() returns True in Python.

error.txt

mmazeika commented 6 years ago

Ah, I see. I was using a different python environment from the one I normally use, so when I actually run test = torch.FloatTensor([1]).cuda(), I get the error

Found GPU0 GeForce GTX 770M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old.

I'll let you know if this problem is fixed with PyTorch installed from source.

goldsborough commented 6 years ago

Sounds good, let me know.

mmazeika commented 6 years ago

Yep, that did the trick.

YiwenShaoStephen commented 6 years ago

Hi, I met exactly the same issue when trying to compile it. And I've checked my PyTorch version is up-to-date (0.4.1) and my cuda version is 9.1.