ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.2k stars 3.45k forks source link

Something wrong when i run the train.py : RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #1934

Closed heylary closed 2 years ago

heylary commented 2 years ago

Search before asking

Question

i can run the yolo5-5.0 and train my custom dataset, but when i run the latest version of yolov3, i can't ever run the train example, Here are the command:

python train.py --img 640 --batch 1 --epochs 5 --data data/coco128.yaml --weights yolov3.pt --device 0 --workers 1 --batch-size 2

and it always sent the error tip: RuntimeError: Unable to find a valid cuDNN algorithm to run convolution,

the batch and the batch-size is smaller than the yolov5, and it also error. i can run the train.py when i use the CPU, could you please give me some advices?

Additional

` D:\WorkData\DeepLearning\yolov3>python train.py --img 640 --batch 1 --epochs 5 --data data/coco128.yaml --weights yolov3.pt --device 0 --workers 1 --batch-size 2 train: weights=yolov3.pt, cfg=, data=data/coco128.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=5, batch_size=2, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=1, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov3 YOLOv3 2021-11-14 torch 1.10.0+cu113 CUDA:0 (GeForce RTX 3050 Laptop GPU, 4096MiB)

hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv3 runs (RECOMMENDED) TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/

             from  n    params  module                                  arguments

0 -1 1 928 models.common.Conv [3, 32, 3, 1] 1 -1 1 18560 models.common.Conv [32, 64, 3, 2] 2 -1 1 20672 models.common.Bottleneck [64, 64] 3 -1 1 73984 models.common.Conv [64, 128, 3, 2] 4 -1 2 164608 models.common.Bottleneck [128, 128] 5 -1 1 295424 models.common.Conv [128, 256, 3, 2] 6 -1 8 2627584 models.common.Bottleneck [256, 256] 7 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 8 -1 8 10498048 models.common.Bottleneck [512, 512] 9 -1 1 4720640 models.common.Conv [512, 1024, 3, 2] 10 -1 4 20983808 models.common.Bottleneck [1024, 1024] 11 -1 1 5245952 models.common.Bottleneck [1024, 1024, False] 12 -1 1 525312 models.common.Conv [1024, 512, 1, 1] 13 -1 1 4720640 models.common.Conv [512, 1024, 3, 1] 14 -1 1 525312 models.common.Conv [1024, 512, 1, 1] 15 -1 1 4720640 models.common.Conv [512, 1024, 3, 1] 16 -2 1 131584 models.common.Conv [512, 256, 1, 1] 17 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 18 [-1, 8] 1 0 models.common.Concat [1] 19 -1 1 1377792 models.common.Bottleneck [768, 512, False] 20 -1 1 1312256 models.common.Bottleneck [512, 512, False] 21 -1 1 131584 models.common.Conv [512, 256, 1, 1] 22 -1 1 1180672 models.common.Conv [256, 512, 3, 1] 23 -2 1 33024 models.common.Conv [256, 128, 1, 1] 24 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 25 [-1, 6] 1 0 models.common.Concat [1] 26 -1 1 344832 models.common.Bottleneck [384, 256, False] 27 -1 2 656896 models.common.Bottleneck [256, 256, False] 28 [27, 22, 15] 1 457725 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]] Model Summary: 333 layers, 61949149 parameters, 61949149 gradients, 156.3 GFLOPs

Transferred 439/439 items from yolov3.pt Scaled weight_decay = 0.0005 optimizer: SGD with parameter groups 72 weight, 75 weight (no decay), 75 bias train: Scanning '..\datasets\coco128\labels\train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████████████████████████████████████████████████| 128/128 [00:00<?, ?it/s] val: Scanning '..\datasets\coco128\labels\train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|████████████████████████████████████████████████████| 128/128 [00:00<?, ?it/s] module 'signal' has no attribute 'SIGALRM'

AutoAnchor: 4.27 anchors/target, 0.994 Best Possible Recall (BPR). Current anchors are a good fit to dataset Image sizes 640 train, 640 val Using 1 dataloader workers Logging results to runs\train\exp15 Starting training for 5 epochs...

 Epoch   gpu_mem       box       obj       cls    labels  img_size
   0/4     1.72G   0.02951   0.02213   0.02484        12       640:   0%|          | 0/64 [00:05<?, ?it/s]                                                                                                  

Traceback (most recent call last): File "train.py", line 625, in main(opt) File "train.py", line 522, in main train(opt.hyp, opt, device, callbacks) File "train.py", line 343, in train callbacks.run('on_train_batch_end', ni, model, imgs, targets, paths, plots, opt.sync_bn) File "D:\WorkData\DeepLearning\yolov3\utils\callbacks.py", line 76, in run logger['callback'](*args, kwargs) File "D:\WorkData\DeepLearning\yolov3\utils\loggers__init__.py", line 86, in on_train_batch_end self.tb.add_graph(torch.jit.trace(de_parallel(model), imgs[0:1], strict=False), []) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 750, in trace _module_class, File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\jit_trace.py", line 965, in trace_module argument_names, File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 1090, in _slow_forward result = self.forward(input, kwargs) File "D:\WorkData\DeepLearning\yolov3\models\yolo.py", line 127, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x)) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 1090, in _slow_forward result = self.forward(*input, *kwargs) File "D:\WorkData\DeepLearning\yolov3\models\common.py", line 45, in forward return self.act(self.bn(self.conv(x))) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 1090, in _slow_forward result = self.forward(*input, **kwargs) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\conv.py", line 446, in forward return self._conv_forward(input, self.weight, self.bias) File "D:\WorkSoftware\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Unable to find a valid cuDNN algorithm to run convolution`

github-actions[bot] commented 2 years ago

👋 Hello @heylary, thank you for your interest in YOLOv3 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov3
$ cd yolov3
$ pip install -r requirements.txt

Environments

YOLOv3 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv3 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv3 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@heylary it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.9 environment, clone the latest repo (code changes daily), and pip install requirements.txt again from scratch.

💡 ProTip! Try one of our verified environments below if you are having trouble with your local environment.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Models and datasets download automatically from the latest YOLOv5 release when first requested.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

github-actions[bot] commented 2 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv3 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv3 🚀 and Vision AI ⭐!