torch / cutorch

A CUDA backend for Torch7
Other
336 stars 208 forks source link

Cutorch, RNG is undefined #8

Closed instagibbs closed 10 years ago

instagibbs commented 10 years ago

I am compiling cutorch on Ubuntu 12.04, CUDA v 6.0.

When I call: "luarocks install cutorch" it results in this error:

[ 9%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o /home/greg/Documents/cutorch/lib/THC/THCTensorRandom.cu(27): error: identifier "CURAND_RNG_PSEUDO_MTGP32" is undefined

1 error detected in the compilation of "/tmp/tmpxft_00000bb8_00000000-4_THC.cpp1.ii". CMake Error at THC_generated_THC.cu.o.cmake:262 (message): Error generating file /home/greg/Documents/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o

make[2]: * [lib/THC/CMakeFiles/THC.dir/./THC_generated_THC.cu.o] Error 1 make[1]: * [lib/THC/CMakeFiles/THC.dir/all] Error 2 make: *\ [all] Error 2

Error: Build error: Failed building.

It appears to be a linking issue, but I am unable to get it working with cmake. The env variable during cmake is "CUDA_curand_LIBRARY=/usr/lib/x86_64-linux-gnu/libcurand.so", which appears correct.

soumith commented 10 years ago

are you setting CUDA_curand_LIBRARY manually, or is it being set by FindCUDA.cmake? The issue is not a linking error, but the headers for curand dont seem to be getting included properly.

instagibbs commented 10 years ago

I am primarily using the value being found by cmake, using the normal installation method. I've tried manually setting it myself as well, to no avail.

soumith commented 10 years ago

could you check where curand.h is located on your machine, and see if CURAND_RNG_PSEUDO_MTGP32 is defined inside it. I want to rule out the possibility of some old version of CUDA installed

instagibbs commented 10 years ago

It is defined in the file, along with the other RNGs. I'm running 6.0

FWIW I can use that RNG in a test file I compiled.

soumith commented 10 years ago

okay, you have no installation errors, and apart from cuda 6.0 installed, there are no old versions of CUDA installed (from say the package manager), i.e. there is only one curand.h on your machine.

That leaves me clueless.

soumith commented 10 years ago

is the CUDA toolkit installed in a non-standard directory?

instagibbs commented 10 years ago

Derp. For some reason it was finding an old curand.h(no idea where it came from) in /usr/include.

I replaced it with my new real copy, and it compiles.

Let me triple-check that it is installed correctly, then I'll close this.

soumith commented 10 years ago

cool, yea that was the suspect.

soumith commented 10 years ago

you might want to check for traces of an old cuda install, you might run into other side effects in that case.

instagibbs commented 10 years ago

Code is failing when I use CUDA to create random values.

"input = torch.randn(ninput) [string "input = torch.randn(ninput)..."]:1: internal error: the default tensor type does not seem to be an actual tensor"

My guess is that my install is still quite broken.

Very strange, because as I noted before, none of my other CUDA software has issues except for Torch7, and this is a fresh install of Ubuntu/CUDA straight from the repo.

soumith commented 10 years ago

no, that's a valid bug. can you open a separate bug report about that.

I reproduced it with this: luajit -lcutorch torch.setdefaulttensortype('torch.CudaTensor') a=torch.randn(10)

instagibbs commented 10 years ago

Sure.

afroze100 commented 10 years ago

Hi, I have the exact same error. I try to follow the conversation and it seems I too have 2 installations of cuda. Well I'm not very sure, but I have 2 copies of curand.h and libcurand.so

./local/cuda-6.0/targets/x86_64-linux/lib/libcurand.so ./lib/x86_64-linux-gnu/libcurand.so ./local/cuda-6.0/targets/x86_64-linux/include/curand.h ./include/curand.h

Can you please advice me how to proceed in this situation? I cannot quite follow from the conversation as to what steps I have to take resolve the issue.