open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.43k stars 9.43k forks source link

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #328

Closed ghost closed 5 years ago

ghost commented 5 years ago

SSD training is OK, but when i inference the model I got this problem... However the retinanet is Ok..

Traceback (most recent call last):
  File "/home/gaozhihua/program/mmdetection/test_img.py", line 15, in <module>
    result = inference_detector(model, img, cfg)
  File "/home/gaozhihua/program/mmdetection/mmdet/apis/inference.py", line 46, in inference_detector
    return _inference_single(model, imgs, img_transform, cfg, device)
  File "/home/gaozhihua/program/mmdetection/mmdet/apis/inference.py", line 30, in _inference_single
    result = model(return_loss=False, rescale=True, **data)
  File "/home/gaozhihua/anaconda2/envs/open-mmlab/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gaozhihua/program/mmdetection/mmdet/models/detectors/base.py", line 82, in forward
    return self.forward_test(img, img_meta, **kwargs)
  File "/home/gaozhihua/program/mmdetection/mmdet/models/detectors/base.py", line 74, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/gaozhihua/program/mmdetection/mmdet/models/detectors/single_stage.py", line 53, in simple_test
    x = self.extract_feat(img)
  File "/home/gaozhihua/program/mmdetection/mmdet/models/detectors/single_stage.py", line 40, in extract_feat
    x = self.backbone(img)
  File "/home/gaozhihua/anaconda2/envs/open-mmlab/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gaozhihua/program/mmdetection/mmdet/models/backbones/ssd_vgg.py", line 83, in forward
    x = F.relu(layer(x), inplace=True)
  File "/home/gaozhihua/anaconda2/envs/open-mmlab/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gaozhihua/anaconda2/envs/open-mmlab/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
yhcao6 commented 5 years ago

What is your test command? Do you test your own trained model? Could you print the shape of x at this line File "/home/gaozhihua/program/mmdetection/mmdet/models/backbones/ssd_vgg.py", line 83, in forward x = F.relu(layer(x), inplace=True) ?

ghost commented 5 years ago

cfg = mmcv.Config.fromfile('configs/pascal_voc/ssd300_voc.py') cfg.model.pretrained = None

construct the model and load checkpoint

model = build_detector(cfg.model, test_cfg=cfg.testcfg) = load_checkpoint(model, 'models/ssd300_voc_vgg16_caffe_240e_20181221-2f05dd40.pth')

test a single image

img = mmcv.imread('/home/gaozhihua/program/mmdetection/data/face_detection/wider_face/0_Parade_marchingband_1_5.jpg') result = inference_detector(model, img, cfg) show_result(img, result)

yhcao6 commented 5 years ago

Could have a try to rescale the img to (1, 3, 300, 300)?

ghost commented 5 years ago

the test cfg set

resize_keep_ratio=True

But ssd is full conv detection network.... Is that affect?

ghost commented 5 years ago

Ok, I have a try...

yhcao6 commented 5 years ago

But the shape is strange, height and width should be same

yhcao6 commented 5 years ago

resize_keep_ratio indicate when resize if need to keep original ratio, but in SSD it should be False

ghost commented 5 years ago

Yeah, it is ok now when i do not keep original ratio.... But I am confused that ssd is a full conv network... Even if i keep the ratio it shouldn't got wrong....

yhcao6 commented 5 years ago

Here is my guess, in SSD last feat map size may smaller than kernel size if you set resize_keep_ratio=True, in this case will report this error, here is an example: x = torch.rand(1, 3, 3, 2).cuda() conv = nn.Conv2d(3, 3, 3).cuda() conv(x) RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM

ZhihuaGao commented 5 years ago

I think that is the problem... And I have a pr to fix the inference bug...

HusainKapadia commented 5 years ago

I have the same error while training a convtranspose2D. Can't seem to understand the issue. The kernel size in my case is also way less than the input size and no padding is used.

BogdanRuzh commented 5 years ago

Had this error because of the wrong dtype