Closed gofugoo closed 3 years ago
Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:
git clone
version of this repo we can not debug it. Before going further run this code and verify your issue persists:
$ git clone https://github.com/ultralytics/yolov5 yolov5_new # clone latest
$ cd yolov5_new
$ python detect.py # verify detection
- **Your custom data.** If your issue is not reproducible in one of our 3 common datasets ([COCO](https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml), [COCO128](https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml), or [VOC](https://github.com/ultralytics/yolov5/blob/master/data/voc.yaml)) we can not debug it. Visit our [Custom Training Tutorial](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data) for guidelines on training your custom data. Examine `train_batch0.jpg` and `test_batch0.jpg` for a sanity check of your labels and images.
- **Your environment.** If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, verify your environment meets all of the [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies specified below. If in doubt, download Python 3.8.0 from https://www.python.org/, create a new [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/), and then install requirements.
If none of these apply to you, we suggest you close this issue and raise a new one using the **Bug Report template**, providing screenshots and **minimum viable code to reproduce your issue**. Thank you!
## Requirements
Python 3.8 or later with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies installed, including `torch>=1.6`. To install run:
```bash
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
@glenn-jocher yes , i have tried to download the newest codes of master branch, when training with the "--resume" parameter, the same issue will occur.
Using torch 1.7.0+cu101 CUDA:0 (Tesla V100-SXM2-16GB, 16130MB)
Namespace(adam=False, batch_size=200, bucket='', cache_images=True, cfg='', data='../data_bbox/data2/custem.yaml', device='', epochs=16000, evolve=False, exist_ok=False, global_rank=-1, hyp='hyps/hyp_evolved.yaml', image_weights=False, img_size=[512, 512], local_rank=-1, log_imgs=16, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', rect=True, resume=True, save_dir='runs/train/exp', single_cls=False, sync_bn=False, total_batch_size=200, weights='./runs/train/exp/weights/last.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/train", view at http://localhost:6006/
2020-11-17 09:22:20.830952: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Hyperparameters {'lr0': 0.0121, 'lrf': 0.219, 'momentum': 0.94, 'weight_decay': 0.00043, 'warmup_epochs': 2.19, 'warmup_momentum': 0.95, 'warmup_bias_lr': 0.0836, 'box': 0.0644, 'cls': 0.52, 'cls_pw': 0.811, 'obj': 0.947, 'obj_pw': 1.48, 'iou_t': 0.2, 'anchor_t': 4.53, 'anchors': 4.68, 'fl_gamma': 0.0, 'hsv_h': 0.0124, 'hsv_s': 0.798, 'hsv_v': 0.36, 'degrees': 0.0, 'translate': 0.119, 'scale': 0.515, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 0.522, 'mixup': 0.0}
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 19904 models.common.BottleneckCSP [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 641792 models.common.BottleneckCSP [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 378624 models.common.BottleneckCSP [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 95104 models.common.BottleneckCSP [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 313088 models.common.BottleneckCSP [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
24 [17, 20, 23] 1 35960 models.yolo.Detect [3, [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], [128, 256, 512]]
Model Summary: 283 layers, 7274872 parameters, 7274872 gradients
Transferred 362/370 items from ./runs/train/exp/weights/last.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
wandb: Currently logged in as: googled (use `wandb login --relogin` to force relogin)
wandb: Tracking run with wandb version 0.10.10
wandb: Resuming run exp
wandb: ⭐️ View project at https://wandb.ai/googled/YOLOv5
wandb: 🚀 View run at https://wandb.ai/googled/YOLOv5/runs/1va6x8kd
wandb: Run data is saved locally in /content/drive/My Drive/dpworkspace/yolov5-master/wandb/run-20201117_092228-1va6x8kd
wandb: Run `wandb off` to turn off syncing.
Scanning images: 100%|██████████| 1455/1455 [00:02<00:00, 491.05it/s]
Scanning labels ../data_bbox/data2/labels.cache (1455 found, 0 missing, 0 empty, 0 duplicate, for 1455 images): 1455it [00:00, 9963.03it/s]
Caching images (0.6GB): 100%|██████████| 1455/1455 [00:30<00:00, 48.16it/s]
Scanning images: 100%|██████████| 99/99 [00:00<00:00, 361.75it/s]
Scanning labels ../data_bbox/data2/labels.cache (99 found, 0 missing, 0 empty, 0 duplicate, for 99 images): 99it [00:00, 5609.93it/s]
Caching images (0.0GB): 100%|██████████| 99/99 [00:03<00:00, 31.84it/s]
Image sizes 512 train, 512 test
Using 2 dataloader workers
Logging results to runs/train/exp
Starting training for 16000 epochs...
Epoch gpu_mem box obj cls total targets img_size
20/15999 13.6G 0.2277 0.02139 0.03305 0.2822 356 512: 38%|███▊ | 3/8 [00:05<00:13, 2.73s/it]Traceback (most recent call last):
File "train.py", line 490, in <module>
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 296, in train
scaler.step(optimizer) # optimizer.step
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/amp/grad_scaler.py", line 321, in step
retval = optimizer.step(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py", line 67, in wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/optim/sgd.py", line 106, in step
buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
RuntimeError: The size of tensor a (24) must match the size of tensor b (40) at non-singleton dimension 0
@gofugoo FYI --resume accepts zero additional arguments. Your only option when using it are:
python train.py --resume # from most recent last.pt
python train.py --resume path/to/last.pt
@gofugoo FYI --resume accepts zero additional arguments. Your only option when using it are:
python train.py --resume # from most recent last.pt python train.py --resume path/to/last.pt
thanks for reminding me, training without "--resume" seems to be fine, but it cannot work with the "--resume" parameter. Anything else to me?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I trained my custom data on google lab.
python3.8 train.py --data ../data_bbox/data2/custom.yaml --cfg ../data_bbox/data2/yolov5s.yaml --weights ../data_bbox/data2/ys_best_2020_11_15.pt --batch-size 200 --epochs 16000 --rect --img-size 512 --cache-images --hyp runs/evolve/hyp_evolved.yaml --resume