jcjohnson / fast-neural-style

Feedforward style transfer
4.28k stars 813 forks source link

What version(s) of CUDA and cudNN are supported/recommended? #142

Open 3DTOPO opened 6 years ago

3DTOPO commented 6 years ago

I am trying to get fast-neural-style running on MacOS with a NVIDIA GeForce GTX 1060.

I was able to get everything built using Cuda 9.1 and cudNN 5.1, but I get the following out of memory error trying to run on the GPU:

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-5631/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory THNN.lua:110: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-5631/cutorch/lib/THC/generic/THCStorage.cu:66

Since luarocks cudnn seemed to require cudNN 5.1, I decided to try installing CUDA 8.0 (since it appears cudNN 5.1 is built for CUDA 7.5 and 8.0). But when trying to build cutorch against CUDA 8.0 the config script hangs. I am guessing 8.0 doesn't support my card?

Anyhow, can anyone clarify what versions of CUDA and cudNN I should be using? Is there a way to disable cudNN (after it was installed) to see if it works without it?

Thanks!

3DTOPO commented 6 years ago

p.s. the training I was attempting when it ran out of memory should have taken around 3GB and the card has 6GB. I also tried setting the batch_size to 1 instead of 4 using a training set of 256px which should have used even less than 3GB.

ProGamerGov commented 6 years ago

It seems that different combinations of Cuda, cuDNN, Torch7, and possibly OS versions as well, have different performance in a way that one might not expect: https://github.com/jcjohnson/neural-style/issues/429

Newer versions of Cuda/cuDNN/Torch7 seem to use more memory than previous versions.

flaushi commented 6 years ago

FYI: I am training with batch_size 2 and style_image_size 256 on a GTX 1060 with 3GB. I use ubuntu with nvidia-390, nvidia-cuda-dev v 7.5.18, libcudnn7 /7.0.5.15 and cuda_9.1.85

flaushi commented 6 years ago

However, I do have to say that exactly this configuration does have problems with instance normalization! The eccv-models work well, also training without instance normalization seems to give reasonable results.