swoook / dsfd

Cloned from Tencent/FaceDetection-DSFD (https://github.com/Tencent/FaceDetection-DSFD)
Other
2 stars 1 forks source link

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED #7

Closed swoook closed 3 years ago

swoook commented 3 years ago

Issue description

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

Code example

  1. Remove torch and torchvision in nvcr.io/nvidia/pytorch:19.11-py3

    root@501243bba88b:/swook# pip uninstall torch torchvision
  2. Install torch==0.3.1 complied with CUDA_VERSION = 9000

    root@501243bba88b:/swook# conda install https://anaconda.org/pytorch/pytorch/0.3.1/download/linux-64/pytorch-0.3.1-py36_cuda9.1.85_cudnn7.0.5_2.tar.bz2
    • We can find torch==0.3.1 for other CUDA from here
  3. Install torchvision==0.2.1

    root@501243bba88b:/swook# pip install torchvision==0.2.1
  4. Inspect containers

    (py36torch14) swook@durian:/data/swook/download$ docker ps
  5. Commit a container for nvcr.io/nvidia/pytorch:19.11-py3

    (py36torch14) swook@durian:/data/swook/download$ docker commit fc5d3760e589 swook/torch031:19.11-py3
  6. Run a demo in a swook/torch031:19.11-py3

    root@501243bba88b:/swook/repos/tencent/dsfd# python demo.py --trained_model /swook/model/dsfd/WIDERFace_DSFD_RES152.pth --widerface_root /swook/dataset/wider-face/WIDER_val/ --save_folder ./save --visual_threshold 0.1 --cuda CUDA
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "demo.py", line 207, in <module>
    test_oneimage()
  File "demo.py", line 172, in test_oneimage
    det0 = infer(net , img , transform , thresh , cuda , shrink)
  File "demo.py", line 72, in infer
    y = net(x)      # forward pass
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/swook/repos/tencent/dsfd/face_ssd.py", line 238, in forward
    conv3_3_x = self.layer1(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
    self.padding, self.dilation, self.groups)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
    return f(input, weight, bias)

System Info

swoook commented 3 years ago
  1. Ho-Joon Lee : Ubuntu 18.04 + 2080 ti + anaconda + tensorflow-gpu (egloos.com)
  2. Comment in Cublas run time error with RTX 2080Ti with Cuda 9.0 · Issue #16034 · pytorch/pytorch
  3. RTX 2080Ti and CUDA version - Deep Learning (Training & Inference) / Frameworks - NVIDIA Developer Forums