Closed leeyunhome closed 3 years ago
@leeyunhome there are no known problems in the training workflows you describe.
In any case your output shows that your code is out of date by 133 commits.
👋 Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:
git clone
version of this repo we can not debug it. Before going further run this code and verify your issue persists:
$ git clone https://github.com/ultralytics/yolov5 yolov5_new # clone latest
$ cd yolov5_new
$ python detect.py # verify detection
- **Your custom data.** If your issue is not reproducible in one of our 3 common datasets ([COCO](https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml), [COCO128](https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml), or [VOC](https://github.com/ultralytics/yolov5/blob/master/data/voc.yaml)) we can not debug it. Visit our [Custom Training Tutorial](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data) for guidelines on training your custom data. Examine `train_batch0.jpg` and `test_batch0.jpg` for a sanity check of your labels and images.
- **Your environment.** If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, verify your environment meets all of the [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies specified below. If in doubt, download Python 3.8.0 from https://www.python.org/, create a new [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/), and install requirements.
If none of these apply to you, we suggest you close this issue and raise a new one using the 🐛 **Bug Report template**, providing screenshots and a [minimum reproducible example](https://docs.ultralytics.com/help/minimum_reproducible_example/) of your issue. Thank you!
## Requirements
Python 3.8 or later with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies installed, including `torch>=1.7`. To install run:
```bash
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Did you solve it? If yes, how?
❔Question
Hello,
There is a problem with training a model trained with a custom dataset as a pretrained weight
The best.pt specified by --weights is a model that has been trained with yolov5s.pt as the weight with 640 image size.
python3 train.py --img 320 --batch 16 --epochs 200 --data /home/yhlee/coding/GitHub/dataset/data.yaml --cfg ./models/yolov5s.yaml --weights /home/yhlee/coding/GitHub/yolov3/runs/train/lpr_result5/weights/best.pt --name lpr_result remote: Enumerating objects: 6, done. remote: Counting objects: 100% (6/6), done. remote: Compressing objects: 100% (4/4), done. remote: Total 6 (delta 2), reused 6 (delta 2), pack-reused 0 Unpacking objects: 100% (6/6), done. From https://github.com/ultralytics/yolov5 80dbb96..980443b multigpu_test -> origin/multigpu_test Your branch is behind 'origin/master' by 133 commits, and can be fast-forwarded. (use "git pull" to update your local branch)
Using torch 1.7.1 CUDA:0 (GeForce RTX 3080, 10016.75MB)
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='/home/yhlee/coding/GitHub/dataset/data.yaml', device='', epochs=200, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[320, 320], local_rank=-1, log_artifacts=False, log_imgs=16, multi_scale=False, name='lpr_result', noautoanchor=False, nosave=False, notest=False, project='runs/train', rect=False, resume=False, save_dir='runs/train/lpr_result25', single_cls=False, sync_bn=False, total_batch_size=16, weights='/home/yhlee/coding/GitHub/yolov3/runs/train/lpr_result5/weights/best.pt', workers=8, world_size=1) Start Tensorboard with "tensorboard --logdir runs/train", view at http://localhost:6006/ Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0} Overriding model.yaml nc=80 with nc=102
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 19904 models.common.BottleneckCSP [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 641792 models.common.BottleneckCSP [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 378624 models.common.BottleneckCSP [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 95104 models.common.BottleneckCSP [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 313088 models.common.BottleneckCSP [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
24 [17, 20, 23] 1 288579 models.yolo.Detect [102, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] Model Summary: 283 layers, 7527491 parameters, 7527491 gradients, 17.7 GFLOPS
Transferred 40/370 items from /home/yhlee/coding/GitHub/yolov3/runs/train/lpr_result5/weights/best.pt Optimizer groups: 62 .bias, 70 conv.weight, 59 other wandb: Currently logged in as: hodu (use
wandb login --relogin
to force relogin) wandb: wandb version 0.10.22 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.10.17 wandb: Syncing run lpr_result25 wandb: ⭐️ View project at https://wandb.ai/hodu/YOLOv5 wandb: 🚀 View run at https://wandb.ai/hodu/YOLOv5/runs/3790r2p0 wandb: Run data is saved locally in /home/yhlee/coding/GitHub/yolov5/wandb/run-20210312_153548-3790r2p0 wandb: Runwandb offline
to turn off syncing.Traceback (most recent call last): File "train.py", line 512, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 147, in train
optimizer.load_state_dict(ckpt['optimizer'])
File "/home/yhlee/anaconda3/envs/yolov3_env/lib/python3.8/site-packages/torch/optim/optimizer.py", line 124, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
wandb: Waiting for W&B process to finish, PID 5192 wandb: Program failed with code 1. Press ctrl-c to abort syncing. wandb:
wandb: Find user logs for this run at: /home/yhlee/coding/GitHub/yolov5/wandb/run-20210312_153548-3790r2p0/logs/debug.log wandb: Find internal logs for this run at: /home/yhlee/coding/GitHub/yolov5/wandb/run-20210312_153548-3790r2p0/logs/debug-internal.log wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: wandb: Synced lpr_result25: https://wandb.ai/hodu/YOLOv5/runs/3790r2p0
====================================== Can you tell me how to solve this problem?
Additional context