ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.49k stars 16.29k forks source link

RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /tmp/pip-req-build-9d9zypi6/torchvision/csrc/cuda/nms_cuda.cu:127) #281

Closed DLLXW closed 4 years ago

DLLXW commented 4 years ago

Has anyone meet this error? `/home/admins/anaconda3/envs/yolov4/bin/python /home/admins/qyl/yolo/yolov5/train.py Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.58, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.014, 'hsv_s': 0.68, 'hsv_v': 0.36, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0} Namespace(adam=False, batch_size=32, bucket='', cache_images=False, cfg='models/yolov5s.yaml', data='data/trash.yaml', device='0', epochs=300, evolve=False, img_size=[416, 416], multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='') Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2070 SUPER', total_memory=7981MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/

          from  n    params  module                                  arguments                     

0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 19904 models.common.BottleneckCSP [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 641792 models.common.BottleneckCSP [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 378624 models.common.BottleneckCSP [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 95104 models.common.BottleneckCSP [256, 128, 1, False]
18 -1 1 18963 torch.nn.modules.conv.Conv2d [128, 147, 1, 1]
19 -2 1 147712 models.common.Conv [128, 128, 3, 2]
20 [-1, 14] 1 0 models.common.Concat [1]
21 -1 1 313088 models.common.BottleneckCSP [256, 256, 1, False]
22 -1 1 37779 torch.nn.modules.conv.Conv2d [256, 147, 1, 1]
23 -2 1 590336 models.common.Conv [256, 256, 3, 2]
24 [-1, 10] 1 0 models.common.Concat [1]
25 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
26 -1 1 75411 torch.nn.modules.conv.Conv2d [512, 147, 1, 1]
27 [-1, 22, 18] 1 0 models.yolo.Detect [44, [[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]]] Model Summary: 191 layers, 7.37106e+06 parameters, 7.37106e+06 gradients

Optimizer groups: 62 .bias, 70 conv.weight, 59 other Caching labels /home/admins/qyl/yolo/yolov5/trashdata/labels/train.npy (13442 found, 0 missing, 0 empty, 0 duplicate, for 13442 images): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 13442/13442 [00:00<00:00, 19863.33it/s] Caching labels /home/admins/qyl/yolo/yolov5/trashdata/labels/val.npy (1494 found, 0 missing, 0 empty, 0 duplicate, for 1494 images): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1494/1494 [00:00<00:00, 20504.88it/s]

Analyzing anchors... Best Possible Recall (BPR) = 0.9995 Image sizes 416 train, 416 test Using 8 dataloader workers Starting training for 300 epochs...

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
 0/299    0.455G   0.08566    0.1208    0.1006    0.3071         4       416: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 421/421 [02:18<00:00,  3.05it/s]
           Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95:   0%|          | 0/47 [00:01<?, ?it/s]

Traceback (most recent call last): File "/home/admins/qyl/yolo/yolov5/train.py", line 394, in train(hyp) File "/home/admins/qyl/yolo/yolov5/train.py", line 299, in train dataloader=testloader) File "/home/admins/qyl/yolo/yolov5/test.py", line 97, in test output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, merge=merge) File "/home/admins/qyl/yolo/yolov5/utils/utils.py", line 605, in non_max_suppression i = torchvision.ops.boxes.nms(boxes, scores, iou_thres) File "/home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 33, in nms return _C.nms(boxes, scores, iou_threshold) RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /tmp/pip-req-build-9d9zypi6/torchvision/csrc/cuda/nms_cuda.cu:127) frame #0: c10::Error::Error(c10::SourceLocation, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f7399472e7d in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x8d1 (0x7f7361174ece in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so) frame #2: nms(at::Tensor const&, at::Tensor const&, float) + 0x183 (0x7f7361138ed7 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so) frame #3: + 0x79cf5 (0x7f7361152cf5 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so) frame #4: + 0x765b0 (0x7f736114f5b0 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so) frame #5: + 0x70d1e (0x7f7361149d1e in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so) frame #6: + 0x70fc2 (0x7f7361149fc2 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so) frame #7: + 0x5be4a (0x7f7361134e4a in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so) frame #8: _PyMethodDef_RawFastCallKeywords + 0x264 (0x55e0fbbf6c94 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #9: _PyCFunction_FastCallKeywords + 0x21 (0x55e0fbbf6db1 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #10: _PyEval_EvalFrameDefault + 0x4dee (0x55e0fbc625be in /home/admins/anaconda3/envs/yolov4/bin/python) frame #11: _PyFunction_FastCallKeywords + 0xfb (0x55e0fbbf620b in /home/admins/anaconda3/envs/yolov4/bin/python) frame #12: _PyEval_EvalFrameDefault + 0x4a59 (0x55e0fbc62229 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #13: _PyEval_EvalCodeWithName + 0x2f9 (0x55e0fbba62b9 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #14: _PyFunction_FastCallKeywords + 0x387 (0x55e0fbbf6497 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #15: _PyEval_EvalFrameDefault + 0x14ea (0x55e0fbc5ecba in /home/admins/anaconda3/envs/yolov4/bin/python) frame #16: _PyEval_EvalCodeWithName + 0xb40 (0x55e0fbba6b00 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #17: _PyFunction_FastCallKeywords + 0x387 (0x55e0fbbf6497 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #18: _PyEval_EvalFrameDefault + 0x14ea (0x55e0fbc5ecba in /home/admins/anaconda3/envs/yolov4/bin/python) frame #19: _PyEval_EvalCodeWithName + 0xb40 (0x55e0fbba6b00 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #20: _PyFunction_FastCallKeywords + 0x387 (0x55e0fbbf6497 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #21: _PyEval_EvalFrameDefault + 0x416 (0x55e0fbc5dbe6 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #22: _PyEval_EvalCodeWithName + 0x2f9 (0x55e0fbba62b9 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #23: PyEval_EvalCodeEx + 0x44 (0x55e0fbba71d4 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #24: PyEval_EvalCode + 0x1c (0x55e0fbba71fc in /home/admins/anaconda3/envs/yolov4/bin/python) frame #25: + 0x22bf44 (0x55e0fbcbcf44 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #26: PyRun_FileExFlags + 0xa1 (0x55e0fbcc72b1 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #27: PyRun_SimpleFileExFlags + 0x1c3 (0x55e0fbcc74a3 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #28: + 0x2375d5 (0x55e0fbcc85d5 in /home/admins/anaconda3/envs/yolov4/bin/python) frame #29: _Py_UnixMain + 0x3c (0x55e0fbcc86fc in /home/admins/anaconda3/envs/yolov4/bin/python) frame #30: libc_start_main + 0xf0 (0x7f73c9529830 in /lib/x86_64-linux-gnu/libc.so.6) frame #31: + 0x1dc3c0 (0x55e0fbc6d3c0 in /home/admins/anaconda3/envs/yolov4/bin/python)

Process finished with exit code 1 ` pytorch1.3.1 torchvision0.4.2 cuda10.0

github-actions[bot] commented 4 years ago

Hello @DLLXW, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

DLLXW commented 4 years ago

Thanks!I have solved it,it seems pytorch1.3 doesn't work,when i change it to 1.4,it work well.

glenn-jocher commented 4 years ago

@DLLXW requirements are shown in readme section, suggest following them.