Open ZuoJiaxing opened 4 years ago
Hello,
There is a conflict between the CUDNN and Pytorch Version. To fix this try changing the PyTorch version. You can find the correct one which is compatible with your CUDA and CUDNN version at the following link: https://pytorch.org/get-started/previous-versions/
Hi, notice that the error might also be caused by the fact that your GPU is incompatible with CUDA 9.2. I had the same error, and in my case (RTX A6000 GPU) changing the following lines in the Dockerfile:
FROM nvidia/cudagl:9.2-devel-ubuntu18.04
# Install cudnn
ENV CUDNN_VERSION 7.6.4.38
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"
RUN apt-get update && apt-get install -y --no-install-recommends \
libcudnn7=$CUDNN_VERSION-1+cuda9.2 \
libcudnn7-dev=$CUDNN_VERSION-1+cuda9.2 \
&& \
apt-mark hold libcudnn7 && \
rm -rf /var/lib/apt/lists/*
with:
FROM nvidia/cudagl:11.1-devel-ubuntu18.04
# Install cudnn
ENV CUDNN_VERSION 7.6.4.38
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
did the trick. Your GPU seems to require CUDA > 10.0 (you can check the CUDA compatibility here).
I followed the instruction you post to install the docker on ubuntu 18.04 with RTX 2020Ti. However, when I run the "python3 tasks/R2R/train.py", I got the following error. Actually, I have tried lots of docker images that meet your requirement cuda9.2+cudnn7+pytorch1.1, I can not go through this error! Please help me with this! Thanks!
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py:54: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) Traceback (most recent call last): File "tasks/R2R/train.py", line 163, in
train_val()
File "tasks/R2R/train.py", line 156, in train_val
dropout_ratio, bidirectional=bidirectional).cuda()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 265, in cuda
return self._apply(lambda t: t.cuda(device))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 127, in _apply
self.flatten_parameters()
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED