yusuketomoto / chainer-fast-neuralstyle

Chainer implementation of "Perceptual Losses for Real-Time Style Transfer and Super-Resolution".
MIT License
803 stars 226 forks source link

Illegal memory access CUDA #31

Open jamiis opened 8 years ago

jamiis commented 8 years ago

If I remove --image_size 512 then I no longer get this error. Running on a Titan X (12 Gb of video memory).

python train.py --checkpoint 2000 --style_image styles/illusion.jpg --batchsize 4 --output illusion --gpu 0 --dataset datasets/mscoco/train2014 --image_size 512
num traning images: 82783
20695 iterations, 2 epochs
/home/jamis/.local/lib/python2.7/site-packages/chainer/cuda.py:87: UserWarning: cuDNN is not enabled.
Please reinstall chainer after you install cudnn
(see https://github.com/pfnet/chainer#installation).
  'cuDNN is not enabled.\n'
Traceback (most recent call last):
  File "train.py", line 110, in <module>
    feature_s = vgg(Variable(style_b, volatile=True))
  File "/home/jamis/src/chainer-fast-neuralstyle/net.py", line 97, in __call__
    h = F.max_pooling_2d(y1, 2, stride=2)
  File "/home/jamis/.local/lib/python2.7/site-packages/chainer/functions/pooling/max_pooling_2d.py", line 173, in max_pooling_2d
    return MaxPooling2D(ksize, stride, pad, cover_all, use_cudnn)(x)
  File "/home/jamis/.local/lib/python2.7/site-packages/chainer/function.py", line 130, in __call__
    outputs = self.forward(in_data)
  File "/home/jamis/.local/lib/python2.7/site-packages/chainer/function.py", line 234, in forward
    return self.forward_gpu(inputs)
  File "/home/jamis/.local/lib/python2.7/site-packages/chainer/functions/pooling/max_pooling_2d.py", line 77, in forward_gpu
    y, self.indexes)
  File "cupy/core/elementwise.pxi", line 545, in cupy.core.core.ElementwiseKernel.__call__ (cupy/core/core.cpp:35252)
  File "cupy/util.pyx", line 36, in cupy.util.memoize.decorator.ret (cupy/util.cpp:1264)
  File "cupy/core/elementwise.pxi", line 405, in cupy.core.core._get_elementwise_kernel (cupy/core/core.cpp:33728)
  File "cupy/core/elementwise.pxi", line 12, in cupy.core.core._get_simple_elementwise_kernel (cupy/core/core.cpp:27106)
  File "cupy/core/elementwise.pxi", line 32, in cupy.core.core._get_simple_elementwise_kernel (cupy/core/core.cpp:26928)
  File "cupy/core/carray.pxi", line 87, in cupy.core.core.compile_with_cache (cupy/core/core.cpp:26615)
  File "/home/jamis/.local/lib/python2.7/site-packages/cupy/cuda/compiler.py", line 138, in compile_with_cache
    mod.load(cubin)
  File "cupy/cuda/function.pyx", line 156, in cupy.cuda.function.Module.load (cupy/cuda/function.cpp:3892)
  File "cupy/cuda/function.pyx", line 157, in cupy.cuda.function.Module.load (cupy/cuda/function.cpp:3840)
  File "cupy/cuda/driver.pyx", line 77, in cupy.cuda.driver.moduleLoadData (cupy/cuda/driver.cpp:1466)
  File "cupy/cuda/driver.pyx", line 59, in cupy.cuda.driver.check_status (cupy/cuda/driver.cpp:1202)
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

It says that I don't have cuDNN installed but I do:

ldconfig -p | grep cudnn
        libcudnn.so.5 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so.5
        libcudnn.so.4 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so.4
cryptexis commented 8 years ago

You're simply running out of memory. You see Titan X has 12 GB of physical memory and 2 logical devices each 6 GB. If you run nvidia-smi you will see 2 graphic cards each with 6 GB total memory. So when you indicate gpu 0 it means you are using just 6 GB. On the other hand when you indicate image_size 512, each batch element requires around 2.2 GB of memory on the GPU. So, for b=1 it's 2.2, b=2 it's 4.4 and so on.

I hope this helps.

jamiis commented 8 years ago

That definitely helps, thanks! But I have 2 Titan X's in my workstation and nvidia-smi shows each as having 12 Gb available:

$ nvidia-smi
Tue Aug  9 11:41:32 2016
+------------------------------------------------------+
| NVIDIA-SMI 361.42     Driver Version: 361.42         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:6D:00.0     Off |                  N/A |
| 22%   32C    P8    14W / 250W |     24MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:6E:00.0     Off |                  N/A |
| 22%   43C    P8    31W / 250W |    911MiB / 12285MiB |     29%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1      4207    G   /usr/lib/xorg/Xorg                             332MiB |
|    1      5307    G   compiz                                         461MiB |
|    1     10126    G   ...inFlow --disable-features=DocumentWriteEv    87MiB |
+-----------------------------------------------------------------------------+

(Typically the second GPU is also sitting at 0%)

cryptexis commented 8 years ago

try with smaller batch size, let's say 2 and see how much memory it takes