ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.45k stars 16.28k forks source link

When I use Hyperpameter Evolution for Hyperpameter tunning, I got an error! #6664

Closed star4s closed 2 years ago

star4s commented 2 years ago

Search before asking

Question

Of course, the training is working with "--noautoanchor" by my custom data .

I need to use Hyperpameter Evolution for Hyperpameter tunning.

At first, I tested the example of COCO128.

python train.py --img-size 2352 --batch 1 --epochs 1 --data coco128.yaml --hyp './data/hyps/hyp.scratch.yaml' --cfg './models/yolov5x6.yaml' --weights yolov5x6.pt --cache --evolve &

The Hyperpameter Evolution of COCO128 is working well.

My command for my custom data:

python train.py --img 2352 --batch 1 --epochs 1 --data test.yaml --cfg './models/yolov5x6.yaml' --weights yolov5x6.pt --cache --evolve 2

After I run the command line, I meet to the error.

AutoAnchor: ERROR: AutoAnchor: ERROR: scipy.cluster.vq.kmeans requested 12 points but returned only 9.

I follow the instruction of Hyperpameter Evolution Guide.

My setting:

Name Version Build Channel

_libgcc_mutex 0.1 main
absl-py 1.0.0 ca-certificates 2021.10.26 h06a4308_2
cachetools 5.0.0 certifi 2021.10.8 py38h06a4308_2
charset-normalizer 2.0.12 cycler 0.11.0 fonttools 4.29.1 google-auth 2.6.0 google-auth-oauthlib 0.4.6 grpcio 1.43.0 idna 3.3 importlib-metadata 4.11.1 kiwisolver 1.3.2 ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
Markdown 3.3.6 matplotlib 3.5.1 ncurses 6.3 h7f8727e_2
numpy 1.22.2 oauthlib 3.2.0 opencv-python 4.5.5.62 openssl 1.1.1m h7f8727e_0
packaging 21.3 pandas 1.4.1 Pillow 9.0.1 pip 21.2.4 py38h06a4308_0
protobuf 3.19.4 pyasn1 0.4.8 pyasn1-modules 0.2.8 pyparsing 3.0.7 python 3.8.12 h12debd9_0
python-dateutil 2.8.2 pytz 2021.3 PyYAML 6.0 readline 8.1.2 h7f8727e_1
requests 2.27.1 requests-oauthlib 1.3.1 rsa 4.8 scipy 1.8.0 seaborn 0.11.2 setuptools 58.0.4 py38h06a4308_0
six 1.16.0 sqlite 3.37.2 hc218d9a_0
tensorboard 2.8.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 thop-0.0.31 2005241907 tk 8.6.11 h1ccaba5_0
torch 1.10.2 torchvision 0.11.3 tqdm 4.62.3 typing_extensions 4.1.1 urllib3 1.26.8 Werkzeug 2.0.3 wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zipp 3.7.0 zlib 1.2.11 h7f8727e_4

How can I use Hyperpameter Evolution for my custom data?

Additional

a Label in my Custom data 1 0.52 0.921 0.072 0.098

After running Hyperparameters Evolution:

train: weights=yolov5x6.pt, cfg=./models/yolov5x6.yaml, data=OP1_WW.yaml, hyp=./data/hyps/hyp.scratch.yaml, epochs=1, batch_size=1, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=2, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: skipping check (offline), for updates see https://github.com/ultralytics/yolov5 YOLOv5 🚀 2022-2-8 torch 1.10.2+cu102 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

hyperparameters: lr0=0.001, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, anchors=3, fl_gamma=0.0, hsv_h=0.0, hsv_s=0.0, hsv_v=0.0, degrees=0.0, translate=0.0, scale=0.0, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.0, mosaic=0.0, mixup=0.0, copy_paste=0.0 Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED) Overriding model.yaml anchors with anchors=3

             from  n    params  module                                  arguments                     

0 -1 1 8800 models.common.Conv [3, 80, 6, 2, 2]
1 -1 1 115520 models.common.Conv [80, 160, 3, 2]
2 -1 4 309120 models.common.C3 [160, 160, 4]
3 -1 1 461440 models.common.Conv [160, 320, 3, 2]
4 -1 8 2259200 models.common.C3 [320, 320, 8]
5 -1 1 1844480 models.common.Conv [320, 640, 3, 2]
6 -1 12 13125120 models.common.C3 [640, 640, 12]
7 -1 1 5531520 models.common.Conv [640, 960, 3, 2]
8 -1 4 11070720 models.common.C3 [960, 960, 4]
9 -1 1 11061760 models.common.Conv [960, 1280, 3, 2]
10 -1 4 19676160 models.common.C3 [1280, 1280, 4]
11 -1 1 4099840 models.common.SPPF [1280, 1280, 5]
12 -1 1 1230720 models.common.Conv [1280, 960, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 8] 1 0 models.common.Concat [1]
15 -1 4 11992320 models.common.C3 [1920, 960, 4, False]
16 -1 1 615680 models.common.Conv [960, 640, 1, 1]
17 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
18 [-1, 6] 1 0 models.common.Concat [1]
19 -1 4 5332480 models.common.C3 [1280, 640, 4, False]
20 -1 1 205440 models.common.Conv [640, 320, 1, 1]
21 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
22 [-1, 4] 1 0 models.common.Concat [1]
23 -1 4 1335040 models.common.C3 [640, 320, 4, False]
24 -1 1 922240 models.common.Conv [320, 320, 3, 2]
25 [-1, 20] 1 0 models.common.Concat [1]
26 -1 4 4922880 models.common.C3 [640, 640, 4, False]
27 -1 1 3687680 models.common.Conv [640, 640, 3, 2]
28 [-1, 16] 1 0 models.common.Concat [1]
29 -1 4 11377920 models.common.C3 [1280, 960, 4, False]
30 -1 1 8296320 models.common.Conv [960, 960, 3, 2]
31 [-1, 12] 1 0 models.common.Concat [1]
32 -1 4 20495360 models.common.C3 [1920, 1280, 4, False]
33 [23, 26, 29, 32] 1 67284 models.yolo.Detect [2, [[0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]], [320, 640, 960, 1280]] Model Summary: 733 layers, 140045044 parameters, 140045044 gradients, 208.3 GFLOPs

Transferred 954/963 items from yolov5x6.pt Scaled weight_decay = 0.0005 optimizer: SGD with parameter groups 159 weight (no decay), 163 weight, 163 bias train: Scanning '/yolov5/datasets/OP1_test/labels/train.cache' images and labels... 14 found, 0 missing, 0 empty, 0 corrupt: 100%|██████████████████████| 14/14 [00:00<?, ?it/s] train: Caching images (0.0GB ram): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 131.07it/s] val: Scanning '/yolov5/datasets/OP1_test/labels/train.cache' images and labels... 14 found, 0 missing, 0 empty, 0 corrupt: 100%|████████████████████████| 14/14 [00:00<?, ?it/s]

AutoAnchor: 1.71 anchors/target, 0.429 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve... AutoAnchor: Running kmeans for 12 anchors on 14 points... AutoAnchor: ERROR: AutoAnchor: ERROR: scipy.cluster.vq.kmeans requested 12 points but returned only 9 Traceback (most recent call last): File "train.py", line 638, in main(opt) File "train.py", line 616, in main results = train(hyp.copy(), opt, device, callbacks) File "train.py", line 248, in train check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz) File "/yolov5/utils/autoanchor.py", line 55, in check_anchors new_bpr = metric(anchors)[0] File "/yolov5/utils/autoanchor.py", line 36, in metric r = wh[:, None] / k[None] RuntimeError: The size of tensor a (14) must match the size of tensor b (4) at non-singleton dimension 1 Exception in thread Thread-13: Traceback (most recent call last): File "yolo_2/lib/python3.8/threading.py", line 932, in _bootstrap_inner

github-actions[bot] commented 2 years ago

👋 Hello @star4s, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@star4s this doesn't have anything to do with evolution.

AutoAnchor needs data to work, you have almost zero data in your dataset, or all your objects are the exact same size.

glenn-jocher commented 2 years ago

@star4s good news 😃! Your original issue may now be fixed ✅ in PR #6668. To receive this update:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

star4s commented 2 years ago

**After I apply to your modified code with autoanchor.py , I have more errors!

After several AutoAnchor process, there are no more error.

and then I meet to some error like that.**

hyperparameters: lr0=0.00095, lrf=0.081, momentum=0.98, weight_decay=0.00051, warmup_epochs=2.58013, warmup_momentum=0.78619, warmup_bias_lr=0.12241, box=0.07791, cls=0.45521, cls_pw=1.32921, obj=1.05062, obj_pw=0.8944, iou_t=0.2, anchor_t=2.79565, anchors=4.49633, fl_gamma=0.0, hsv_h=0.0, hsv_s=0.0, hsv_v=0.0, degrees=0.0, translate=0.0, scale=0.0, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.0, mosaic=0.0, mixup=0.0, copy_paste=0.0 Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED) Overriding model.yaml anchors with anchors=4.49633

             from  n    params  module                                  arguments                     

0 -1 1 8800 models.common.Conv [3, 80, 6, 2, 2]
1 -1 1 115520 models.common.Conv [80, 160, 3, 2]
2 -1 4 309120 models.common.C3 [160, 160, 4]
3 -1 1 461440 models.common.Conv [160, 320, 3, 2]
4 -1 8 2259200 models.common.C3 [320, 320, 8]
5 -1 1 1844480 models.common.Conv [320, 640, 3, 2]
6 -1 12 13125120 models.common.C3 [640, 640, 12]
7 -1 1 5531520 models.common.Conv [640, 960, 3, 2]
8 -1 4 11070720 models.common.C3 [960, 960, 4]
9 -1 1 11061760 models.common.Conv [960, 1280, 3, 2]
10 -1 4 19676160 models.common.C3 [1280, 1280, 4]
11 -1 1 4099840 models.common.SPPF [1280, 1280, 5]
12 -1 1 1230720 models.common.Conv [1280, 960, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 8] 1 0 models.common.Concat [1]
15 -1 4 11992320 models.common.C3 [1920, 960, 4, False]
16 -1 1 615680 models.common.Conv [960, 640, 1, 1]
17 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
18 [-1, 6] 1 0 models.common.Concat [1]
19 -1 4 5332480 models.common.C3 [1280, 640, 4, False]
20 -1 1 205440 models.common.Conv [640, 320, 1, 1]
21 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
22 [-1, 4] 1 0 models.common.Concat [1]
23 -1 4 1335040 models.common.C3 [640, 320, 4, False]
24 -1 1 922240 models.common.Conv [320, 320, 3, 2]
25 [-1, 20] 1 0 models.common.Concat [1]
26 -1 4 4922880 models.common.C3 [640, 640, 4, False]
27 -1 1 3687680 models.common.Conv [640, 640, 3, 2]
28 [-1, 16] 1 0 models.common.Concat [1]
29 -1 4 11377920 models.common.C3 [1280, 960, 4, False]
30 -1 1 8296320 models.common.Conv [960, 960, 3, 2]
31 [-1, 12] 1 0 models.common.Concat [1]
32 -1 4 20495360 models.common.C3 [1920, 1280, 4, False]
33 [23, 26, 29, 32] 1 89712 models.yolo.Detect [2, [[0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7]], [320, 640, 960, 1280]] Model Summary: 733 layers, 140067472 parameters, 140067472 gradients, 208.4 GFLOPs

Transferred 954/963 items from yolov5x6.pt Scaled weight_decay = 0.00051 optimizer: SGD with parameter groups 159 weight (no decay), 163 weight, 163 bias train: Scanning '/yolov5/datasets/OP1_test/labels/train.cache' images and labels... 14 found, 0 missing, 0 empty, 0 corrupt: 100%|██████████████████████| 14/14 [00:00<?, ?it/s] train: Caching images (0.1GB ram): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 115.34it/s] val: Scanning '/yolov5/datasets/OP1_test/labels/train.cache' images and labels... 14 found, 0 missing, 0 empty, 0 corrupt: 100%|████████████████████████| 14/14 [00:00<?, ?it/s]

AutoAnchor: 0.00 anchors/target, 0.000 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve... AutoAnchor: Running kmeans for 16 anchors on 14 points... AutoAnchor: ERROR: Cannot take a larger sample than population when 'replace=False' Traceback (most recent call last): File "train.py", line 638, in main(opt) File "train.py", line 616, in main results = train(hyp.copy(), opt, device, callbacks) File "train.py", line 248, in train check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz) File "/yolov5/utils/autoanchor.py", line 54, in check_anchors new_bpr = metric(anchors)[0] File "yolov5/utils/autoanchor.py", line 35, in metric r = wh[:, None] / k[None] RuntimeError: The size of tensor a (14) must match the size of tensor b (4) at non-singleton dimension 1

glenn-jocher commented 2 years ago

@star4s I already told you:

AutoAnchor needs data to work, you have almost zero data in your dataset, or all your objects are the exact same size.

star4s commented 2 years ago

@glenn-jocher Hi, Thank you for your answer and your help.

My Custom Data information: . image size: 2352 X 1728 . class number: 17

My command for my custom data:

python train.py --img 2352 --batch 2 --epochs 5 --data test.yaml --cfg './models/yolov5x6.yaml' --weights yolov5x6.pt --cache --evolve 1000

My yolov5x6.yaml :

parameters

nc: 17 # number of classes depth_multiple: 1.33 # model depth multiple width_multiple: 1.25 # layer channel multiple

anchors

anchors:

I got change many data for AutoAnchor, and then I run Hyperpameter Evolution. I have the strange best result between coco128 and my data.

coco128 : Best generation: 246 Last generation: 299 metrics/precision, metrics/recall, metrics/mAP_0.5, metrics/mAP_0.5:0.95, val/box_loss 0.62719, 0.79158, 0.79514, 0.52422, 0.034782

My custom data: Best generation: 154 Last generation: 299 metrics/precision, metrics/recall , metrics/mAP_0.5 , metrics/mAP_0.5:0.95, val/box_loss 2.945e-06 , 0.00092593, 1.4938e-06, 2.9876e-07, 0.031016

This mean some thing wrong for my custom data? I have wrong annotation?

The most metrics/precision and metrics/recall from my custom data was 0. Why is zero in the case of most metrics/precision and metrics/recall from my custom data?

Thank you for your attention.

glenn-jocher commented 2 years ago

@star4s evolution should be run from a stable starting scenario. If your starting scenario returns zero mAP evolution will not help.

star4s commented 2 years ago

@glenn-jocher Thank you for your answer and your help.

You mentioned about a stable starting scenario.

What is the mean of a stable starting scenario?

Do I Need to modify structures of yolov5x6.yaml such as YOLOv5 head and YOLOv5 backbone?

glenn-jocher commented 2 years ago

@star4s Nothing needs to be modified anywhere. Evolution improves results on your base scenario. If your base scenario is returning zero mAP there's not much for evolution to work with.

Like any nonlinear optimization problem the final result is a function of the initial guess.