machanic / AU_R-CNN

The official implementation code of paper: "AU R-CNN:Encoding Expert Prior Knowledge into R-CNN for Action Unit Detection".
https://arxiv.org/abs/1812.05788
87 stars 24 forks source link

cupy.cuda.cudnn.CuDNNError: CUDNN_STATUS_EXECUTION_FAILED #15

Closed 821736960 closed 3 years ago

821736960 commented 3 years ago

Will finalize trainer extensions and updater before reraising the exception. Traceback (most recent call last): File "./AU_rcnn/train.py", line 458, in main() File "./AU_rcnn/train.py", line 449, in main trainer.run() File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/training/trainer.py", line 349, in run six.reraise(exc_info) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/training/trainer.py", line 316, in run update() File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 175, in update self.update_core() File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 187, in update_core optimizer.update(loss_func, in_arrays) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/optimizer.py", line 864, in update loss = lossfun(*args, *kwds) File "/cluster/home/chenjinjie/AU_RCNN_fuxian/AU_R-CNN_allproj/AU_rcnn/links/model/faster_rcnn/faster_rcnn_train_chain.py", line 78, in call features = self.faster_rcnn.extractor(imgs) File "/cluster/home/chenjinjie/AU_RCNN_fuxian/AU_R-CNN_allproj/AU_rcnn/links/model/faster_rcnn/faster_rcnn_resnet101.py", line 323, in call h = self.bn1(self.conv1(x)) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/link.py", line 294, in call out = forward(args, **kwargs) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/links/connection/convolution_2d.py", line 184, in forward groups=self.groups) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/functions/connection/convolution_2d.py", line 589, in convolution_2d y, = fnode.apply(args) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/function_node.py", line 321, in apply outputs = self.forward(in_data) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/function_node.py", line 512, in forward return self.forward_gpu(inputs) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/functions/connection/convolution_2d.py", line 189, in forward_gpu return self._forward_cudnn(x, W, b, y) File "/cluster/home/chenjinjie/.conda/envs/AUROI/lib/python3.6/site-packages/chainer/functions/connection/convolution_2d.py", line 250, in _forward_cudnn auto_tune=auto_tune, tensor_core=tensor_core) File "cupy/cudnn.pyx", line 1575, in cupy.cudnn.convolution_forward File "cupy/cuda/cudnn.pyx", line 1208, in cupy.cuda.cudnn.convolutionForward File "cupy/cuda/cudnn.pyx", line 712, in cupy.cuda.cudnn.check_status cupy.cuda.cudnn.CuDNNError: CUDNN_STATUS_EXECUTION_FAILED

同学你好,我在运行训练代码时出现了上述bug,用的一个gpu,库版本如下: chainer=6.3.0 cuda=9.0 cupy-cuda90 =6.3.0

你有遇到过这个问题吗,谢谢。

machanic commented 3 years ago

I never encounter this problem, this is because of the cuda version and cupy or chainer incompatible. I suggest you to reinstall chainer and cuda and cupy to strictly follow the official website instruction!

821736960 commented 3 years ago

I never encounter this problem, this is because of the cuda version and cupy or chainer incompatible. I suggest you to reinstall chainer and cuda and cupy to strictly follow the official website instruction!

could you further provide the version of cuda, cudnn, and python you used? Thanks!

machanic commented 3 years ago

I use OLD version of chainer, because when I write the code and the paper, the chainer is 4.0 , I forgot the cupy version, but if you install chainer, it will automatically select the right version of cupy. I remember that Cuda = 9.0 to fit chainer 4.0 . However, my code can fit the lastest chainer, you can read https://docs.chainer.org/en/stable/install.html to follow their instruction to install. Because new version of chainer runs faster than old version.

machanic commented 3 years ago

@821736960 Also, I will translate the AU R-CNN code to PyTorch version soon if have time.

821736960 commented 3 years ago

@821736960 Also, I will translate the AU R-CNN code to PyTorch version soon if have time.

Thanks,I try cuda=9.2 and cupy-cuda92, it works!