open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
28.9k stars 9.35k forks source link

CUDA driver version is insufficient for CUDA runtime version? #116

Closed wangg12 closed 5 years ago

wangg12 commented 5 years ago

When I tried to train mask-rcnn on coco, an unexpected error occurred. I was able to train other pytorch networks besides mmdection. My driver version: 390.87 CUDA version: V9.0.176

2018-11-23 11:50:19,170 @train.py:61 - INFO - Distributed training: False
2018-11-23 11:50:20,460 @base.py:55 - INFO - load model from: modelzoo://resnet50
2018-11-23 11:50:20,856 @checkpoint.py:58 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer2.3.bn1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer3.5.bn3.num_batches_tracked, bn1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer1.0.bn2.num_batches_tracked

loading annotations into memory...
Done (t=452.10s)
creating index...
index created!
2018-11-23 11:58:04,558 @runner.py:327 - INFO - Start running, host: gu@gu-PC, work_dir: /data/wanggu/mmdetection/output/mask_rcnn_r50_fpn_1x
2018-11-23 11:58:04,559 @runner.py:328 - INFO - workflow: [('train', 1)], max: 12 epochs
cudaCheckError() failed : CUDA driver version is insufficient for CUDA runtime version
hellock commented 5 years ago

I guess you are actually using CUDA 9.2 and the requirement for minimum driver version is 396.xx. Sometimes conda will install cuda 9.2 with pytorch.

wangg12 commented 5 years ago

I checked torch.version.cuda, it is 9.0.176. And my other pytorch programs are working without any problem. It is strange and I wonder where mmdet or mmcv check the cuda driver and cuda version.

hellock commented 5 years ago

That's strange. mmdet and mmcv do not check the driver, only pytorch itself will check it.

wangg12 commented 5 years ago

I updated the driver version from 390.87 to 410.73 and the problem is solved.

ximitiejiang commented 5 years ago

@hellock just as you said, conda may install another version cuda? I have met import issue when use mmdetection test.py, so I doubt my cuda version is not compatible. I installed cuda9.2.148.1(with cudnn9.2_v7.3.1, driver410.66) which I can check by nvcc -V, but I saw CUDA version10 showed in Nvidia-smi interface, which I installed long time ago and I already uninstalled. my question is: actually I did not find any cuda10 in conda or other place, any way to confirm which version cuda I use?