RTX cards require minimum Pytorch 1.0 [CUDNN_STATUS_EXECUTION_FAILED]

ghost commented 5 years ago

On my Linux mint 19.1 using an RTX 2070

When trying to recognize using the default installation:

(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_ALAR_min_model_17_12_18.txt --prev_model ALAR_min_model_17_12_18.pth --prod_data ./images/
2019-01-21 13:42:19,280 - optparse - INFO - Reading configuration from config_ALAR_min_model_17_12_18.txt
2019-01-21 13:42:19,282 - P2PaLA - INFO - Working on prod inference...
2019-01-21 13:42:19,283 - P2PaLA - INFO - Results will be saved to ./work/results/prod
2019-01-21 13:42:19,599 - P2PaLA - INFO - Resumming from model ALAR_min_model_17_12_18.pth
/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/cuda/__init__.py:95: UserWarning: 
    Found GPU0 GeForce RTX 2070 which requires CUDA_VERSION >= 9000 for
     optimal performance and fast startup time, but your PyTorch was compiled
     with CUDA_VERSION 8000. Please install the correct PyTorch binary
     using instructions from http://pytorch.org

  warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))

So I installed latest torch and torchvision:

(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ pip install --ignore-installed torch torchvision

Then ran recognition:

(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_ALAR_min_model_17_12_18.txt --prev_model ALAR_min_model_17_12_18.pth --prod_data ./images/
/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
2019-01-21 13:58:31,771 - optparse - INFO - Reading configuration from config_ALAR_min_model_17_12_18.txt
2019-01-21 13:58:31,773 - P2PaLA - INFO - Working on prod inference...
2019-01-21 13:58:31,774 - P2PaLA - INFO - Results will be saved to ./work/results/prod
2019-01-21 13:58:32,125 - P2PaLA - INFO - Resumming from model ALAR_min_model_17_12_18.pth
2019-01-21 13:58:34,859 - P2PaLA - INFO - Preprocessing data from ./images/
P2PaLA.py:1195: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  pr_x = Variable(sample["image"], volatile=True)
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
2019-01-21 13:58:35,463 - P2PaLA - INFO - Production stage done. total time taken: 0.604010820388794
2019-01-21 13:58:35,463 - P2PaLA - INFO - Average time per page: 0.604010820388794
2019-01-21 13:58:35,463 - P2PaLA - INFO - All Done...

Now the problem is when trying to train

(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_BL_only.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
2019-01-21 14:06:09,788 - optparse - INFO - Reading configuration from config_BL_only.txt
2019-01-21 14:06:09,789 - optparse - DEBUG - Creating output dir: ./work_BL_only
2019-01-21 14:06:09,790 - optparse - DEBUG - Creating checkpoints dir: ./work_BL_only/checkpoints
2019-01-21 14:06:09,790 - P2PaLA - INFO - Working on training stage...
2019-01-21 14:06:09,791 - P2PaLA - WARNING - tensorboardX is not installed, display logger set to OFF.
2019-01-21 14:06:09,791 - P2PaLA - INFO - Preprocessing data from ./data/train
/home/home/Desktop/programs/P2PaLA/nn_models/models.py:293: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
  init.uniform(m.weight.data, 0.0, 0.02)
/home/home/Desktop/programs/P2PaLA/nn_models/models.py:298: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
  init.uniform(m.weight.data, 1.0, 0.02)
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "P2PaLA.py", line 1262, in <module>
    main()
  File "P2PaLA.py", line 606, in main
    epoch_lossD += d_loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

lquirosd commented 5 years ago

Hi, There is a major change on pytorch from v0.3 to v0.4, I'm migrating the code to support those changes. In the meanwhile I recommend to keep pytorch0.3.1. Your GPU needs cuda >9.0, so please install pytorch 0.3.1 with cuda 9.1 using: pip uninstall torch torchvision pip install https://download.pytorch.org/whl/cu91/torch-0.3.1-cp36-cp36m-linux_x86_64.whl

More info about previous pytorch version on pytorch page

ghost commented 5 years ago

pip uninstall torch torchvision
pip install https://download.pytorch.org/whl/cu91/torch-0.3.1-cp36-cp36m-linux_x86_64.whl

(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_BL_only.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
2019-01-21 15:37:56,527 - optparse - INFO - Reading configuration from config_BL_only.txt
2019-01-21 15:37:56,529 - P2PaLA - INFO - Working on training stage...
2019-01-21 15:37:56,529 - P2PaLA - WARNING - tensorboardX is not installed, display logger set to OFF.
2019-01-21 15:37:56,529 - P2PaLA - INFO - Preprocessing data from ./data/train
Traceback (most recent call last):
  File "P2PaLA.py", line 1262, in <module>
    main()
  File "P2PaLA.py", line 528, in main
    y_gen = nnG(x)
  File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home/Desktop/programs/P2PaLA/nn_models/models.py", line 94, in forward
    return self.model(input_x)
  File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home/Desktop/programs/P2PaLA/nn_models/models.py", line 184, in forward
    return F.log_softmax(self.model(input_x), dim=1)
  File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
    self.padding, self.dilation, self.groups)
  File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
    return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

lquirosd commented 5 years ago

I don't think the issue is related to your Ubuntu version. But you need to install the right combination of cuda and pytorch for sure. If you have installed cuda 9.1 and python 3.6 the command I post before should work, but If you have another combination, like cuda 9.0 or python 2.7 you need to find the right pythorch for it (on pytorch web).

I just test it using python 3.5, cuda9.1 on a GTX 1080 and a TITAN X and it works (I don't have a RTX to test it)

ghost commented 5 years ago

Same error, even after installing Cuda 9.1

(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  410.48  Thu Sep  6 06:36:33 CDT 2018
GCC version:  gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

ghost commented 5 years ago

hmmmm.... it seems that RTX cards don't support Cuda 9.1, that's weird.

Will you consider supporting Cuda 10 via Pytorch 1?

I'm migrating the code to support those changes.

lquirosd commented 5 years ago

Yes, my goal is to migrate all the code to the latest version of pytorch, but now i'm a bit short of time and I don't think I will release a new version in the following couple of weeks. Thanks for spotting out the issue with new GPU's. I will try to migrate the code as soon as posible.

In the meanwhile, you can use the tool for inference using the pre-trained model available on CPU (just add the option --gpu -1).

ghost commented 5 years ago

Hoping that you support Cuda 10, Thank you

home@home-lnx:~/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce RTX 2070"
  CUDA Driver Version / Runtime Version          10.0 / 9.1
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 7951 MBytes (8337227776 bytes)
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
  (36) Multiprocessors, ( 64) CUDA Cores/MP:     2304 CUDA Cores
  GPU Max Clock rate:                            1815 MHz (1.81 GHz)
  Memory Clock rate:                             7001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 46 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS

ghost commented 5 years ago

@lquirosd Can you share this information, which versions are you using:

Cudnn
Cuda
Pytorch
Compute

lquirosd commented 5 years ago

Yes, the software have been tested on several configurations Cudnn: 5,6,7 Cuda: 8,9 Pytorch: 0.3* Python 2.7, 3.5, 3.6 Os: for training: Ubuntu 16.04, for test: Ubuntu 16.04, Mac OS 10.13

My current set-up is

>>> sys.version
'3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n[GCC 7.3.0]'
>>> torch.__version__
'0.3.1'
>>> torch.version.cuda
'8.0.61'
>>> torch.backends.cudnn.version()
7005:

ghost commented 5 years ago

It seems that the problem is caused because RTX cards only support versions of Cuda 10+ and having compute capability 7.5, which the Nvidia forums confirmed to me.

@lquirosd Will you consider upgrading to Pytorch 1.0 ? Note: CUDA 10 support for compute capability 3.0 – 7.5 (Kepler, Maxwell, Pascal, Volta, Turing)

lquirosd commented 5 years ago

Hi, Did you change the "batch_size" parameter to fit your card? I mean, default is 8 images per mini-batch, but RTX 2070 memory is only 8GB. I think it'll support a max mini-batch of 4 images or so. Can you please run a experiment using a small mini-batch?

ghost commented 5 years ago

@lquirosd

This is not a memory issue, RTX cards (Turing) initial support is at Cuda 10, Pytorch 1.0 supports Cuda 10 / 9 / 8 versions. So the only solution is by upgrading the code to Pytorch 1.0

lquirosd commented 5 years ago

I just release a new branch for Pytorch 1.0:

git clone --single-branch --branch PyTorch-v1.0 https://github.com/lquirosd/P2PaLA.git

Please notice this branch is not fully tested, so some bugs can be around. I ran some test on Pytorch: 1.0.0, CUDA: 9.0 and cudnn:7401, but cuda 10 is untested

ghost commented 5 years ago

May you find peace in your life. Thank you

lquirosd / P2PaLA

RTX cards require minimum Pytorch 1.0 [CUDNN_STATUS_EXECUTION_FAILED] #11