CUDA8.0 + cuDnn6.0 + GPU Volta architecture + torch7 build error

ytzhao commented 6 years ago

Hi all, I tried to install torch7 on Tesla V100, but met some problems. For some reasons, I have to use CUDA8.0(although for Volta architecture, it's better to use CUDA9.0).

Building on 32 cores
-- Found Torch7 in /home/torch/install
-- Found CUDA: /usr/local/cuda-8.0 (found suitable version "8.0", minimum required is "6.5") 
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- MAGMA not found. Compiling without MAGMA support
-- Autodetected CUDA architecture(s): 7.0 7.0 7.0 7.0
-- got cuda version 8.0
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_70,code=sm_70;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Performing Test HAS_LUAL_SETFUNCS
-- Performing Test HAS_LUAL_SETFUNCS - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: /home/torch/extra/cutorch
make: *** No rule to make target 'install'.  Stop.

I think the problem is the gpu architecture dismatch cuda version. (for Volta architecture, it officially needs CUDA9.0, it also works with CUDA8.0). How could I disable the auto-detection, and manually set it to other runnable version, such as "compute_60" or somehow? Thank you.

stancil1 commented 5 years ago

I am having this exact issue. Is there any fix you have found for this yet?

ytzhao commented 5 years ago

@stancil1 Hi, I change the CUDA version to 9.0, it somehow works.

ajhool commented 5 years ago

I'm finding that the torch.cudnn package is not properly initializing the Volta GPUs (it takes about 10 minutes to configure the GPUs -- it should be nearly instantaneous). Were you able to configure GPUs properly using cutorch?

This is the code that cudnn is having trouble with:

https://github.com/soumith/cudnn.torch/blob/R7/init.lua

torch / cutorch

CUDA8.0 + cuDnn6.0 + GPU Volta architecture + torch7 build error #820