ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
49.78k stars 16.12k forks source link

Poor metrics when copying anchor values #8625

Closed MannyGozzi closed 2 years ago

MannyGozzi commented 2 years ago

Search before asking

Question

image Above is the best run from evolving hyperparameters using the --evolve flag with a very short number of epochs and evolutions just to demonstrate a point. The code used to run this example was:

# if training for evolutions, set epochs to 10, and evolve to 50
# if you desire to graph the evolution with the hyperparaters, add the --hyp path and change the hyperparameter.yaml file located in opt/ml/input/data/hyp.scratch-low.yaml
!export WANDB_RUN_GROUP="evolution_const_seed" && python ./deepsea-yolov5/yolov5/train.py \
--img=640 \
--data=./deepsea-yolov5/opt/ml/custom_config.yaml  \
--batch=2 \
--weights=yolov5s.pt \
--cfg=./deepsea-yolov5/yolov5/models/yolov5s.yaml \
--project="902005-vaa"\
--cache \
--epochs=7 \
--evolve=3

The outputted anchor values were placed inside the model.yaml file I was using

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 17  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [15,13, 25,24, 39,26]  # P3/8
  - [31,42, 51,40, 50,68]  # P4/16
  - [77,61, 120,98, 270,263]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

I ran a normal, non-evolution run using these optimal hyperparameters:

# YOLOv5 Hyperparameter Evolution Results
# Best generation: 1
# Last generation: 2
#    metrics/precision,       metrics/recall,      metrics/mAP_0.5, metrics/mAP_0.5:0.95,         val/box_loss,         val/obj_loss,         val/cls_loss
#              0.31078,              0.38663,              0.26372,              0.13787,             0.040274,             0.025833,             0.050041

lr0: 0.00978
lrf: 0.01005
momentum: 0.93562
weight_decay: 0.00049
warmup_epochs: 3.0193
warmup_momentum: 0.8
warmup_bias_lr: 0.10161
box: 0.04949
cls: 0.51181
cls_pw: 0.98931
obj: 1.0007
obj_pw: 0.96092
iou_t: 0.2
anchor_t: 3.9804
fl_gamma: 0.0
hsv_h: 0.01519
hsv_s: 0.71478
hsv_v: 0.40095
degrees: 0.0
translate: 0.1014
scale: 0.49673
shear: 0.0
perspective: 0.0
flipud: 0.0
fliplr: 0.5
mosaic: 1.0
mixup: 0.0
copy_paste: 0.0
anchors: 3.1

And found the following results. Using this command:

# Change the contents of the deeosea-yolov5/opt/ml/input/data/hyp.scratch-low.yaml 
# to utilize the best found hyperparameters from the evolutions performed above. 
!export WANDB_RUN_GROUP="hyperparam_const_seed" && python ./deepsea-yolov5/yolov5/train.py \
--img=640 \
--data=./deepsea-yolov5/opt/ml/custom_config.yaml  \
--batch=2 \
--weights=yolov5s.pt \
--cfg=./deepsea-yolov5/yolov5/models/yolov5s.yaml \
--hyp=./deepsea-yolov5/opt/ml/input/data/hyp.scratch-low.yaml \
--project="902005-vaa"\
--cache \
--noautoanchor \
--epochs=50

image Is there any reason why these anchor values are outputting such horrible metrics in comparison to when they were used in the evolution runs?

Additional

No response

glenn-jocher commented 2 years ago

@MannyGozzi you appear not to understand LR schedulers. There is no point in comparing epoch 6/6 with epoch 6/50, they should not be similar at all.

MannyGozzi commented 2 years ago

@MannyGozzi you appear not to understand LR schedulers. There is no point in comparing epoch 6/6 with epoch 6/50, they should not be similar at all.

I'll run the comparison again with the same epoch count, my apologies for not knowing.

MannyGozzi commented 2 years ago

@MannyGozzi you appear not to understand LR schedulers. There is no point in comparing epoch 6/6 with epoch 6/50, they should not be similar at all.

Alright results have been recalculated using the same command, only a change to 7 epochs instead of the previous 50. The output is shown below and remains consistent with my question. Why are the metrics so comparatively low when using the custom anchor values that are output from the most optimal run? image

glenn-jocher commented 2 years ago

@MannyGozzi evolve doesn't output anchors, evolve outputs a set of hyperparameters that are evolved on your scenario. If autoanchor was used during evolution then likewise it should be used during training.

MannyGozzi commented 2 years ago

I was hoping to have the same anchor values across runs. I altered the autoanchor.py file to see if the weights are being brought in from the model.yaml file and it seems they aren't. See the attached image.

image

glenn-jocher commented 2 years ago

@MannyGozzi as AutoAnchor says if you want to use these for a future run then just put them in your model.yaml file and then point to it during training, i.e. --cfg model.yaml

MannyGozzi commented 2 years ago

@MannyGozzi as AutoAnchor says if you want to use these for a future run then just put them in your model.yaml file and then point to it during training, i.e. --cfg model.yaml

These are the anchors in the model file that I'm pointing to: image

glenn-jocher commented 2 years ago

@MannyGozzi if you're using default hyps everything should be fine, i.e. https://github.com/ultralytics/yolov5/blob/b367860196a2590a5f44c9b18401dedfc0543077/data/hyps/hyp.scratch-low.yaml#L20

MannyGozzi commented 2 years ago

@MannyGozzi if you're using default hyps everything should be fine, i.e.

https://github.com/ultralytics/yolov5/blob/b367860196a2590a5f44c9b18401dedfc0543077/data/hyps/hyp.scratch-low.yaml#L20

I understand but the fact that it doesn't seem to use the anchors in the model cfg file is concerning. Is that a bug?

glenn-jocher commented 2 years ago

There is no bug. Everything works correctly. If I change yolov5s.yaml first anchor to 1,1 and point to it the updated anchors are displayed in the console and used for training:

Screen Shot 2022-07-25 at 11 11 31 PM