Open ysagon opened 8 years ago
Hi @ysagon . Someone mentioned that this kind of error occurs if you compile for CUDA compute capability 3.0+ and run it on 2.0.
There's an environment variable you can use at compile time called TORCH_CUDA_ARCH_LIST that can manually let you specify the architecture you care about, in your case 2.0.
the build log of cunn should tell you what architectures it is being built against...
@soumith thanks, I'm suspecting something like that too.
I have no idea how luarocks works, so I don't know if I do things correctly.
I have installed cunn like this:
export TORCH_CUDA_ARCH_LIST=2.0
[sagon@login1 torch]$ luarocks install cunn
Installing https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec... switching to 'build' mode
Initialized empty Git repository in /tmp/luarocks_cunn-scm-1-6364/cunn/.git/
remote: Counting objects: 538, done.
remote: Compressing objects: 100% (302/302), done.
remote: Total 538 (delta 357), reused 360 (delta 220), pack-reused 0
Receiving objects: 100% (538/538), 380.09 KiB, done.
Resolving deltas: 100% (357/357), done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/sagon/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/sagon/torch/install/lib/luarocks/rocks/cunn/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install
-- The C compiler identification is GNU 4.4.7
-- The CXX compiler identification is GNU 4.4.7
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/sagon/torch/install
-- Found CUDA: /usr/local/cuda (found suitable version "7.5", minimum required is "6.5")
-- Automatic GPU detection failed. Building for all known architectures.
-- Compiling for CUDA architecture: 2.0 2.1(2.0) 3.0 3.5 5.0 5.2
-- Configuring done
-- Generating done
It seems it's compiling cunn for every architecture, which seems to be the default if auto-detection failed. Do I have to pass this variable in an other way?
Anyway, I have logged to the node with the gpus, reinstalled cunn, and this time it'w written it's compiled for cuda architecture 2.0 but the tests still failed at same place.
Hi all.
I get a similar error:
32/134 SparseLinear_backward ........................................... [PASS]
33/134 SpatialReflectionPadding_backward ............................... [PASS]
34/134 SpatialAdaptiveMaxPooling_forward_noncontig ..................... [PASS]
35/134 SpatialAveragePooling_backward .................................. [PASS]
36/134 ELU_transposed .................................................. [PASS]
37/134 SpatialBatchNormalization ....................................... [WAIT]
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6519/cutorch/init.c line=218 error=77 : an illegal memory access was encountered
37/134 SpatialBatchNormalization ....................................... [ERROR]
cuda runtime error (77) : an illegal memory access was encountered at /tmp/luarocks_cutorch-scm-1-6519/cutorch/lib/THC/generic/THCStorage.c:158
The GPU in that computer is a TeslaM2075. I wonder if someone manage to solve this situation.
I'm the sys admin of a cluster and I'm trying to make cunn work with our gpu.
I'm installing/compiling from a node without gpu (but with cuda sdk 7.5).
It's compiling fine.
I'm trying to execute the tests:
luajit -l cunn -e 'cunn.test()'
In this node, there are two M2090 (compute capability 2.0)
Driver version 352.79