nileshkulkarni / csm

Code release for "Canonical Surface Mapping via Geometric Cycle Consistency"
https://nileshkulkarni.github.io/csm/
186 stars 31 forks source link

RuntimeError: CuDNN error: CUDNN_STATUS_MAPPING_ERROR #4

Closed jayis520 closed 5 years ago

jayis520 commented 5 years ago

Hi,I was trying to run model training command python -m csm.experiments.csm.csp --name=csm_bird_net --n_data_workers=4 --dataset=cub --display_port=8094 --scale_bias=0.75 --warmup_pose_iter=2000, it report:

 from ._conv import register_converters as _register_converters
loading /media/lab601/000555B8000E726C/wuy/csm/csm/data/../cachedir/cub/data/train_cub_cleaned.mat
5964 images
Loading Mean shape from /media/lab601/000555B8000E726C/wuy/csm/csm/data/../cachedir/cub/../shapenet/bird/shape.mat
Visdom Env Name csm_bird_net_wpose
create web directory /media/lab601/000555B8000E726C/wuy/csm/csm/experiments/csm/../../cachedir/web...
Traceback (most recent call last):
  File "/home/lab601/anaconda2/envs/csm/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/lab601/anaconda2/envs/csm/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/media/lab601/000555B8000E726C/wuy/csm/csm/experiments/csm/csp.py", line 439, in <module>
    app.run(main)
  File "/home/lab601/.local/lib/python2.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/lab601/.local/lib/python2.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/media/lab601/000555B8000E726C/wuy/csm/csm/experiments/csm/csp.py", line 434, in main
    trainer.train()
  File "csm/nnutils/train_utils.py", line 181, in train
    self.forward()
  File "/media/lab601/000555B8000E726C/wuy/csm/csm/experiments/csm/csp.py", line 219, in forward
    codes_pred = self.model.forward(feed_dict)
  File "csm/nnutils/icn_net.py", line 330, in forward
    unet_output = self.unet_gen.forward(img)
  File "csm/nnutils/unet.py", line 61, in forward
    return self.model(input)
  File "/home/lab601/anaconda2/envs/csm/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "csm/nnutils/unet.py", line 111, in forward
    self.x_enc = self.down(x_inp)
  File "/home/lab601/anaconda2/envs/csm/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lab601/anaconda2/envs/csm/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/lab601/anaconda2/envs/csm/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lab601/anaconda2/envs/csm/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CuDNN error: CUDNN_STATUS_MAPPING_ERROR

What's the solution to it?It's the cuda version problem?

nileshkulkarni commented 5 years ago

This typically happens when NMR is not set up properly which means that chainer and cupy are not correctly installed. Did you try running examples inside neural_mesh_renderer?

jayis520 commented 5 years ago

Thank you for your reply.The chainer and cupy have been having problems since they were installed. They are installed according to the process, but they have problem When run examples inside neural_mesh_renderer, By the way, cupy version should be at least 6.3.0,not 2.3.0.

nileshkulkarni commented 5 years ago

So the instructions that I have in the repo are for cuda8.0

I recently tried using it on cuda-10.1 I have used the following configurations: nvcc

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

cupy and chainer

cupy-cuda101==6.3.0
chainer==6.3.0

This my cuda configuration for conda

cudatoolkit               10.0.130                      0  
cupy-cuda101              6.3.0                    pypi_0    pypi
pytorch                   1.0.1           py2.7_cuda10.0.130_cudnn7.4.2_2    pytorch

Also, install cupy and chainer from source.

You try pip install first if it doesn't work, trying to install it from source might help.

jayis520 commented 5 years ago

So the instructions that I have in the repo are for cuda8.0

I recently tried using it on cuda-10.1 I have used the following configurations: nvcc

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

cupy and chainer

cupy-cuda101==6.3.0
chainer==6.3.0

This my cuda configuration for conda

cudatoolkit               10.0.130                      0  
cupy-cuda101              6.3.0                    pypi_0    pypi
pytorch                   1.0.1           py2.7_cuda10.0.130_cudnn7.4.2_2    pytorch

Also, install cupy and chainer from source.

You try pip install first if it doesn't work, trying to install it from source might help.

OK,Thank you so much,I will try it.