positive666 / yolo_research

based on yolo-high-level project (detect\pose\classify\segment\):include yolov5\yolov7\yolov8\ core ,improvement research ,SwintransformV2 and Attention Series. training skills, business customization, engineering deployment C
GNU General Public License v3.0
756 stars 145 forks source link

训练问题求助 #71

Closed Hyper-Devil closed 2 years ago

Hyper-Devil commented 2 years ago

❔Question

自定义数据集训练失败

Additional context

python train.py --data data/cone.yaml --cfg models/yolov5n.yaml --weights '' --hyp data/hyps/hyp.scratch-low.yaml wandb: Currently logged in as: whd. Use wandb login --relogin to force relogin train: weights=, cfg=models/yolov5n.yaml, data=data/cone.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=300, batch_size=32, imgsz=416, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0,1, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, swin_float=False, aux_ota_loss=False github: skipping check (Docker image), for updates see https://github.com/positive666/yolov5 /bin/sh: 1: git: not found YOLOv5_research_plus 🚀 2022-8-23 Python-3.9.12 torch-1.8.1+cu111 CUDA:0 (GeForce RTX 3080, 10015MiB) CUDA:1 (GeForce RTX 3080, 10018MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/ wandb: Tracking run with wandb version 0.13.2 wandb: Run data is saved locally in /yolov5_research/wandb/run-20220823_222739-3jw78jcj wandb: Run wandb offline to turn off syncing. wandb: Syncing run chocolate-field-6 wandb: ⭐ View project at https://wandb.ai/whd/YOLOv5 wandb: 🚀 View run at https://wandb.ai/whd/YOLOv5/runs/3jw78jcj YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected. Overriding model.yaml nc=80 with nc=3

             from  n    params  module                                  arguments                     

0 -1 1 1760 models.common.Conv [3, 16, 6, 2, 2]
1 -1 1 4672 models.common.Conv [16, 32, 3, 2]
2 -1 1 4800 models.common.C3 [32, 32, 1]
3 -1 1 18560 models.common.Conv [32, 64, 3, 2]
4 -1 2 29184 models.common.C3 [64, 64, 2]
5 -1 1 73984 models.common.Conv [64, 128, 3, 2]
6 -1 3 156928 models.common.C3 [128, 128, 3]
7 -1 1 295424 models.common.Conv [128, 256, 3, 2]
8 -1 1 296448 models.common.C3 [256, 256, 1]
9 -1 1 164608 models.common.SPPF [256, 256, 5]
10 -1 1 33024 models.common.Conv [256, 128, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 90880 models.common.C3 [256, 128, 1, False]
14 -1 1 8320 models.common.Conv [128, 64, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 22912 models.common.C3 [128, 64, 1, False]
18 -1 1 36992 models.common.Conv [64, 64, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 74496 models.common.C3 [128, 128, 1, False]
21 -1 1 147712 models.common.Conv [128, 128, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 296448 models.common.C3 [256, 256, 1, False]
24 [17, 20, 23] 1 10824 models.yolo.Detect [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]] initialize_biases done YOLOv5n summary: 270 layers, 1767976 parameters, 1767976 gradients, 4.2 GFLOPs

AMP: checks passed ✅ optimizer: SGD(lr=0.01) with parameter groups 57 weight (decay=0.0), 60 weight(decay=0.0005), 60 bias WARNING: DP not recommended, use torch.distributed.run for best DDP Multi-GPU results. See Multi-GPU Tutorial at https://github.com/ultralytics/yolov5/issues/475 to get started. train: Scanning '/cone_dataset/labels/train.cache' images and labels... 9512 fou val: Scanning '/cone_dataset/labels/val.cache' images and labels... 1166 found, Plotting labels to runs/train/exp6/labels.jpg...

AutoAnchor: 2.48 anchors/target, 0.929 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve... AutoAnchor: WARNING: Extremely small objects found: 8033 of 84054 labels are < 3 pixels in size AutoAnchor: Running kmeans for 9 anchors on 83607 points... AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.8556: 100%|████ AutoAnchor: thr=0.25: 0.9999 best possible recall, 6.72 anchors past thr AutoAnchor: n=9, img_size=416, metric_all=0.454/0.854-mean/best, past_thr=0.551-mean: 3,4, 4,5, 6,7, 8,9, 11,13, 14,17, 19,23, 25,30, 37,36 AutoAnchor: Done ✅ (optional: update model *.yaml to use these anchors in the future) Image sizes 416 train, 416 val Using 8 dataloader workers Logging results to runs/train/exp6 Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls    labels  img_size

0%| | 0/298 [00:00<?, ?it/s]
Traceback (most recent call last): File "/yolov5_research/train.py", line 652, in main(opt) File "/yolov5_research/train.py", line 551, in main train(opt.hyp, opt, device, callbacks) File "/yolov5research/train.py", line 294, in train for i, (imgs, targets, paths, ) in pbar: # batch ------------------------------------------------------------- File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/yolov5_research/utils/dataloaders.py", line 158, in iter yield next(self.iterator) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/usr/local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/yolov5_research/utils/dataloaders.py", line 623, in getitem if random.random() < hyp['paste_in']: KeyError: 'paste_in'

wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. wandb:
wandb: Synced chocolate-field-6: https://wandb.ai/whd/YOLOv5/runs/3jw78jcj wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20220823_222739-3jw78jcj/logs

positive666 commented 2 years ago

KeyError: 'paste_in' ,在你的超参YAML里加入这个 就行

positive666 commented 2 years ago

❔Question

自定义数据集训练失败

Additional context

python train.py --data data/cone.yaml --cfg models/yolov5n.yaml --weights '' --hyp data/hyps/hyp.scratch-low.yaml wandb: Currently logged in as: whd. Use wandb login --relogin to force relogin train: weights=, cfg=models/yolov5n.yaml, data=data/cone.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=300, batch_size=32, imgsz=416, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=0,1, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, swin_float=False, aux_ota_loss=False github: skipping check (Docker image), for updates see https://github.com/positive666/yolov5 /bin/sh: 1: git: not found YOLOv5_research_plus 🚀 2022-8-23 Python-3.9.12 torch-1.8.1+cu111 CUDA:0 (GeForce RTX 3080, 10015MiB) CUDA:1 (GeForce RTX 3080, 10018MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/ wandb: Tracking run with wandb version 0.13.2 wandb: Run data is saved locally in /yolov5_research/wandb/run-20220823_222739-3jw78jcj wandb: Run wandb offline to turn off syncing. wandb: Syncing run chocolate-field-6 wandb: ⭐ View project at https://wandb.ai/whd/YOLOv5 wandb: 🚀 View run at https://wandb.ai/whd/YOLOv5/runs/3jw78jcj YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected. Overriding model.yaml nc=80 with nc=3

             from  n    params  module                                  arguments                     

0 -1 1 1760 models.common.Conv [3, 16, 6, 2, 2] 1 -1 1 4672 models.common.Conv [16, 32, 3, 2] 2 -1 1 4800 models.common.C3 [32, 32, 1] 3 -1 1 18560 models.common.Conv [32, 64, 3, 2] 4 -1 2 29184 models.common.C3 [64, 64, 2] 5 -1 1 73984 models.common.Conv [64, 128, 3, 2] 6 -1 3 156928 models.common.C3 [128, 128, 3] 7 -1 1 295424 models.common.Conv [128, 256, 3, 2] 8 -1 1 296448 models.common.C3 [256, 256, 1] 9 -1 1 164608 models.common.SPPF [256, 256, 5] 10 -1 1 33024 models.common.Conv [256, 128, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 90880 models.common.C3 [256, 128, 1, False] 14 -1 1 8320 models.common.Conv [128, 64, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 22912 models.common.C3 [128, 64, 1, False] 18 -1 1 36992 models.common.Conv [64, 64, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 74496 models.common.C3 [128, 128, 1, False] 21 -1 1 147712 models.common.Conv [128, 128, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 296448 models.common.C3 [256, 256, 1, False] 24 [17, 20, 23] 1 10824 models.yolo.Detect [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]] initialize_biases done YOLOv5n summary: 270 layers, 1767976 parameters, 1767976 gradients, 4.2 GFLOPs

AMP: checks passed ✅ optimizer: SGD(lr=0.01) with parameter groups 57 weight (decay=0.0), 60 weight(decay=0.0005), 60 bias WARNING: DP not recommended, use torch.distributed.run for best DDP Multi-GPU results. See Multi-GPU Tutorial at ultralytics/yolov5#475 to get started. train: Scanning '/cone_dataset/labels/train.cache' images and labels... 9512 fou val: Scanning '/cone_dataset/labels/val.cache' images and labels... 1166 found, Plotting labels to runs/train/exp6/labels.jpg...

AutoAnchor: 2.48 anchors/target, 0.929 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve... AutoAnchor: WARNING: Extremely small objects found: 8033 of 84054 labels are < 3 pixels in size AutoAnchor: Running kmeans for 9 anchors on 83607 points... AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.8556: 100%|████ AutoAnchor: thr=0.25: 0.9999 best possible recall, 6.72 anchors past thr AutoAnchor: n=9, img_size=416, metric_all=0.454/0.854-mean/best, past_thr=0.551-mean: 3,4, 4,5, 6,7, 8,9, 11,13, 14,17, 19,23, 25,30, 37,36 AutoAnchor: Done ✅ (optional: update model *.yaml to use these anchors in the future) Image sizes 416 train, 416 val Using 8 dataloader workers Logging results to runs/train/exp6 Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls    labels  img_size

0%| | 0/298 [00:00<?, ?it/s] Traceback (most recent call last): File "/yolov5_research/train.py", line 652, in main(opt) File "/yolov5_research/train.py", line 551, in main train(opt.hyp, opt, device, callbacks) File "/yolov5research/train.py", line 294, in train for i, (imgs, targets, paths, ) in pbar: # batch ------------------------------------------------------------- File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/yolov5_research/utils/dataloaders.py", line 158, in iter yield next(self.iterator) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/usr/local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/yolov5_research/utils/dataloaders.py", line 623, in getitem if random.random() < hyp['paste_in']: KeyError: 'paste_in'

wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. wandb: wandb: Synced chocolate-field-6: https://wandb.ai/whd/YOLOv5/runs/3jw78jcj wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20220823_222739-3jw78jcj/logs

明天可以从新拉一下代码

Hyper-Devil commented 2 years ago

收到,感谢您的工作