ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.31k stars 16.24k forks source link

Hyperparameter Evolution #607

Open glenn-jocher opened 4 years ago

glenn-jocher commented 4 years ago

📚 This guide explains hyperparameter evolution for YOLOv5 🚀. Hyperparameter evolution is a method of Hyperparameter Optimization using a Genetic Algorithm (GA) for optimization. UPDATED 28 March 2023.

Hyperparameters in ML control various aspects of training, and finding optimal values for them can be a challenge. Traditional methods like grid searches can quickly become intractable due to 1) the high dimensional search space 2) unknown correlations among the dimensions, and 3) expensive nature of evaluating the fitness at each point, making GA a suitable candidate for hyperparameter searches.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

1. Initialize Hyperparameters

YOLOv5 has about 30 hyperparameters used for various training settings. These are defined in *.yaml files in the /data directory. Better initial guesses will produce better final results, so it is important to initialize these values properly before evolving. If in doubt, simply use the default values, which are optimized for YOLOv5 COCO training from scratch.

https://github.com/ultralytics/yolov5/blob/2da2466168116a9fa81f4acab744dc9fe8f90cac/data/hyps/hyp.scratch-low.yaml#L2-L34

2. Define Fitness

Fitness is the value we seek to maximize. In YOLOv5 we define a default fitness function as a weighted combination of metrics: mAP@0.5 contributes 10% of the weight and mAP@0.5:0.95 contributes the remaining 90%, with Precision P and Recall R absent. You may adjust these as you see fit or use the default fitness definition (recommended). https://github.com/ultralytics/yolov5/blob/4103ce9ad0393cc27f6c80457894ad7be0cb1f0d/utils/metrics.py#L12-L16

3. Evolve

Evolution is performed about a base scenario which we seek to improve upon. The base scenario in this example is finetuning COCO128 for 10 epochs using pretrained YOLOv5s. The base scenario training command is:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache

To evolve hyperparameters specific to this scenario, starting from our initial values defined in Section 1., and maximizing the fitness defined in Section 2., append --evolve:

# Single-GPU
python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve

# Multi-GPU
for i in 0 1 2 3 4 5 6 7; do
  sleep $(expr 30 \* $i) &&  # 30-second delay (optional)
  echo 'Starting GPU '$i'...' &&
  nohup python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --device $i --evolve > evolve_gpu_$i.log &
done

# Multi-GPU bash-while (not recommended)
for i in 0 1 2 3 4 5 6 7; do
  sleep $(expr 30 \* $i) &&  # 30-second delay (optional)
  echo 'Starting GPU '$i'...' &&
  "$(while true; do nohup python train.py... --device $i --evolve 1 > evolve_gpu_$i.log; done)" &
done

The default evolution settings will run the base scenario 300 times, i.e. for 300 generations. You can modify generations via the --evolve argument, i.e. python train.py --evolve 1000. https://github.com/ultralytics/yolov5/blob/6a3ee7cf03efb17fbffde0e68b1a854e80fe3213/train.py#L608

The main genetic operators are crossover and mutation. In this work mutation is used, with a 80% probability and a 0.04 variance to create new offspring based on a combination of the best parents from all previous generations. Results are logged to runs/evolve/exp/evolve.csv, and the highest fitness offspring is saved every generation as runs/evolve/hyp_evolved.yaml:

# YOLOv5 Hyperparameter Evolution Results
# Best generation: 287
# Last generation: 300
#    metrics/precision,       metrics/recall,      metrics/mAP_0.5, metrics/mAP_0.5:0.95,         val/box_loss,         val/obj_loss,         val/cls_loss
#              0.54634,              0.55625,              0.58201,              0.33665,             0.056451,             0.042892,             0.013441

lr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.2  # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937  # SGD momentum/Adam beta1
weight_decay: 0.0005  # optimizer weight decay 5e-4
warmup_epochs: 3.0  # warmup epochs (fractions ok)
warmup_momentum: 0.8  # warmup initial momentum
warmup_bias_lr: 0.1  # warmup initial bias lr
box: 0.05  # box loss gain
cls: 0.5  # cls loss gain
cls_pw: 1.0  # cls BCELoss positive_weight
obj: 1.0  # obj loss gain (scale with pixels)
obj_pw: 1.0  # obj BCELoss positive_weight
iou_t: 0.20  # IoU training threshold
anchor_t: 4.0  # anchor-multiple threshold
# anchors: 3  # anchors per output layer (0 to ignore)
fl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015  # image HSV-Hue augmentation (fraction)
hsv_s: 0.7  # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4  # image HSV-Value augmentation (fraction)
degrees: 0.0  # image rotation (+/- deg)
translate: 0.1  # image translation (+/- fraction)
scale: 0.5  # image scale (+/- gain)
shear: 0.0  # image shear (+/- deg)
perspective: 0.0  # image perspective (+/- fraction), range 0-0.001
flipud: 0.0  # image flip up-down (probability)
fliplr: 0.5  # image flip left-right (probability)
mosaic: 1.0  # image mosaic (probability)
mixup: 0.0  # image mixup (probability)
copy_paste: 0.0  # segment copy-paste (probability)

We recommend a minimum of 300 generations of evolution for best results. Note that evolution is generally expensive and time consuming, as the base scenario is trained hundreds of times, possibly requiring hundreds or thousands of GPU hours.

4. Visualize

evolve.csv is plotted as evolve.png by utils.plots.plot_evolve() after evolution finishes with one subplot per hyperparameter showing fitness (y axis) vs hyperparameter values (x axis). Yellow indicates higher concentrations. Vertical distributions indicate that a parameter has been disabled and does not mutate. This is user selectable in the meta dictionary in train.py, and is useful for fixing parameters and preventing them from evolving.

evolve

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

OrjwanZaafarani commented 3 years ago

If I set the generations to 300 and started training for 100 epoch. How many times will it train?

glenn-jocher commented 3 years ago

@OrjwanZaafarani generations indicates how many times your training loop will run. 300 generations means your model will train 300 times.

OrjwanZaafarani commented 3 years ago

@glenn-jocher So I'm running it for 30000 epoch haha. Thanks.

glenn-jocher commented 3 years ago

@OrjwanZaafarani no not really, you are repeating your base scenario 300 times.

Each scenario can be anything you want, in your case you have a 100 epoch training, but each training is independent, so you are not training a single model for 30000 epochs.

amfogor commented 3 years ago

What number of epochs should I specify?

glenn-jocher commented 3 years ago

@amfogor evolution base scenario is completely up to you.

aficionadoai commented 3 years ago

Is there a known example of applying Weights & Biases sweeps to the Hyperparameter?

glenn-jocher commented 3 years ago

@aficionadoai yes: https://wandb.ai/glenn-jocher/COCO128_evolve/sweeps/f9d7fyj2

Sweep

shang0085 commented 3 years ago

I am trying out evolve from a base training which has ran 200 epoch (yolov5s img size 1280 using hyp scratch.)

I have not seen a evolve.txt generated out for me yet.. But this is the result.txt inside the evolve folder. Looks like the training loss is increasing.. Is that normal ? If its not... i would rather stop the trial now

Copy of segment of result.txt from evolve folder 190/999 3.77G 0.04028 0.1747 0.01328 0.2282 746 1280 0.5444 0.4805 0.4781 0.3003 0.03869 0.263 0.01426 191/999 3.77G 0.04043 0.1773 0.01338 0.2311 831 1280 0.5866 0.4537 0.4782 0.3004 0.03869 0.263 0.01426 192/999 3.77G 0.04025 0.175 0.01342 0.2287 682 1280 0.5439 0.4813 0.4784 0.3004 0.03869 0.263 0.01426 193/999 3.77G 0.04036 0.1753 0.01327 0.2289 751 1280 0.5425 0.4818 0.4783 0.3005 0.03869 0.263 0.01426 194/999 3.77G 0.04027 0.174 0.01322 0.2275 1290 1280 0.585 0.4542 0.4784 0.3005 0.03869 0.263 0.01425 195/999 3.77G 0.04024 0.174 0.01332 0.2276 781 1280 0.5638 0.4661 0.4783 0.3005 0.03869 0.263 0.01425 196/999 3.77G 0.04031 0.1743 0.01337 0.2279 611 1280 0.5864 0.4532 0.4785 0.3006 0.0387 0.263 0.01425 197/999 3.77G 0.04029 0.1786 0.0131 0.232 1069 1280 0.5871 0.4522 0.4786 0.3004 0.0387 0.263 0.01425 198/999 3.77G 0.04019 0.1763 0.01329 0.2298 769 1280 0.5884 0.4525 0.4786 0.3005 0.0387 0.263 0.01425 199/999 3.77G 0.04023 0.1759 0.01328 0.2294 990 1280 0.5868 0.453 0.4785 0.3004 0.0387 0.263 0.01425 200/999 3.77G 0.04026 0.1742 0.01322 0.2277 847 1280 0.5831 0.4539 0.4785 0.3005 0.0387 0.263 0.01425 201/999 3.77G 0.04016 0.1773 0.01332 0.2308 726 1280 0.585 0.4537 0.4785 0.3006 0.0387 0.263 0.01425 .... .... .... 660/999 4.37G 0.05093 0.2059 0.01411 0.271 940 1280 0 0 0 0 0 0 0 661/999 4.37G 0.05132 0.2052 0.0141 0.2706 1018 1280 0 0 0 0 0 0 0 662/999 4.37G 0.05131 0.2082 0.01401 0.2736 947 1280 0 0 0 0 0 0 0 663/999 4.37G 0.05148 0.206 0.01391 0.2713 950 1280 0 0 0 0 0 0 0 664/999 4.37G 0.05125 0.2093 0.01401 0.2745 946 1280 0 0 0 0 0 0 0 665/999 4.37G 0.05129 0.2075 0.01394 0.2728 1075 1280 0 0 0 0 0 0 0 666/999 4.37G 0.05147 0.2081 0.01396 0.2735 857 1280 0 0 0 0 0 0 0 667/999 4.37G 0.05124 0.2089 0.01399 0.2742 1112 1280 0 0 0 0 0 0 0 668/999 4.37G 0.05125 0.2081 0.01401 0.2733 849 1280 0 0 0 0 0 0 0 669/999 4.37G 0.05126 0.2059 0.01387 0.2711 989 1280 0 0 0 0 0 0 0 670/999 4.37G 0.05124 0.2085 0.01392 0.2736 963 1280 0 0 0 0 0 0 0 671/999 4.37G 0.05125 0.2057 0.01394 0.2709 706 1280 0 0 0 0 0 0 0 672/999 4.37G 0.05114 0.2065 0.01378 0.2714 823 1280 0 0 0 0 0 0 0 673/999 4.37G 0.05146 0.2087 0.01396 0.2741 843 1280 0 0 0 0 0 0 0 674/999 4.37G 0.0515 0.2088 0.01403 0.2743 834 1280 0 0 0 0 0 0 0

glenn-jocher commented 3 years ago

@shang0085 evolve.txt will be generated once the first generation has completed.

shang0085 commented 3 years ago

@shang0085 evolve.txt will be generated once the first generation has completed.

Yeah my understanding is 300 generations as a default. I have seen that my evolve have ran 300 epochs starting off from a base of 200 epochs? So in my case 300 generations would not mean 300 epoch? What would happen if it reaches the end of base training number, which for my case I set my base as 1000 epoch. Would it continue to evolve and ignore the 1000 limit number ?

glenn-jocher commented 3 years ago

@shang0085 a generation is 1 training. A training is whatever you decide.

255isWhite commented 3 years ago

@glenn-jocher Hello,how can i go back to the unfinished evolve process,like a 300 generations whole evolve but unfortunately ended at 200th due to various reasons

glenn-jocher commented 3 years ago

@billie7 to resume evolution you simply re-run the same command, and evolution will start from an evolve.txt if it finds it.

255isWhite commented 3 years ago

@glenn-jocher I changed some parsers in train.py including"--weights""--cfg""--data""--hyp""--epochs"",and my command is [python train.py --evolve],every time I just re-run this command to resume,but the generations went to 350 and seemed not to stop,but the evolution default times is 300 which i didn't change.I can find the evolve.txt ,and it has 350 lines XD

glenn-jocher commented 3 years ago

@billie7 default generations is 300 which you can modify as you see fit, ie python train.py —evolve 100

255isWhite commented 3 years ago

@glenn-jocher Yes,I did not modify the default generations,but this evolution process didi not stop at it's 300 times. Should i link the hyp.yaml in the train.py to runs/train/evolve/hyp_evolved.yaml?

glenn-jocher commented 3 years ago

@billie7 yes that's because the evolution command will run 300 generations by default. If it finds an evolve.txt it will start from there.

255isWhite commented 3 years ago

Thanks a lot,a stupid mistake i've made

besbesmany commented 3 years ago

where I can find yolov5/evolve.png I can't find it after evolve also the visualization images where to find?

how to change crossover and mutation

glenn-jocher commented 3 years ago

@besbesmany evolve.txt is plotted as evolve.png after evolution completes. The console printout is very clear I would say:

Screenshot 2021-08-04 at 12 59 17

All evolution code is inside train.py

Bellk17 commented 3 years ago

I have a fairly large dataset (900+ classes) and evolution is a bit out of my price range. I was curious if anyone had luck evolving on a subset of data? I know many model parameters won't transfer as they are dependent on the training set size, but it seems certain parameters, such as image augmentation, may work across models.

IF we can identify such hyper-parameters, would it be of value to train multiple subsets and utilize an ensemble to further generalize outcomes?

glenn-jocher commented 3 years ago

@Bellk17 that's an interesting idea! I think most people take shortcuts in the epochs dimension rather than the dataset dimension, i.e. evolving COCO on < 300 epochs rather than 300 epochs, but using a subset of the dataset might work better.

The compute-saving test would be if evolving on 10% of your dataset converges faster and/or correlates better with full dataset results than the same with 10% of epochs. I'm optimistic there the answer might be yes, especially for large datasets.

There's a term for statistical subsampling that eludes me right now but I agree with your second point as well. The RANSAC method uses a similar random subsampling approach https://en.wikipedia.org/wiki/Random_sample_consensus

Definitely follow up on this thread if you have more information or results.

Bellk17 commented 3 years ago

It should be trivial to test the assumptions. Unfortunately, the gains should be more pronounced on larger datasets, for which I don't have the computing resources / time to run the full benchmark (single RTX 8000).

However, if there is a good open-source "large" dataset that people have already evolved, given the training set, initial and final hyper-parameters, we could run both 10% approaches and compare for POC. If it shows promise, we would want to test multiple variations / dataset sizes to properly model accuracy of each approach should resources become available.

Knowing when to switch to a sub-sampling approach (should it work) would be amazing when optimizing large models on a budget.

Zegorax commented 3 years ago

@glenn-jocher Do you have any example of WandB sweep YAML for YOLOv5 ? I'm confused about which method to use (--evolve or Sweep)

glenn-jocher commented 3 years ago

@Zegorax see https://wandb.ai/glenn-jocher/COCO128_evolve. This was a 300-generation evolution I ran normally, i.e. python train.py --evolve, not using the sweeps function.

ya-stack commented 2 years ago

How to find a number of generations left in the Hyperparameter Tuning process?

glenn-jocher commented 2 years ago

@ya-stack you can monitor evolution progress by viewing your evolve.csv file. One row is added to this file per generation.

pranathlcp commented 2 years ago

Hi,

Thanks for the wonderful effort in developing and maintaining the YOLOv5 repository.

I ran the following code with the intention of finding the optimal hyperparameters for a custom dataset (via Google Colab).

!python train.py --epochs 10 --img 416 --data gtsdb.yaml --weights runs/train/exp/weights/best.pt --cache --evolve

Based on what I have understood, above command will run for 300 generations with 10 epochs per generation (3000 runs in total).

In cases where the training gets interrupted due to limitations of Google Colab, could I please know the exact command which is required for resuming the hyperparameter evolution process?

I checked this issue as well, where you have instructed to keep evolve.txt in yolov5 directory. Is it the evolve.csv that you have mentioned about?

I'm a little confused on what exactly needs to be done to resume the hyperparameter evolution process. Thank you again!

glenn-jocher commented 2 years ago

@pranathlcp 👋 Hello! Thanks for asking about resuming evolution.

Resuming YOLOv5 🚀 evolution is a bit different than resuming a normal training run with python train.py --resume. If you started an evolution run which was interrupted, or finished normally, and you would like to continue for additional generations where you left off, then you pass --resume and specify the --name of the evolution you want to resume, i.e.:

Start Evolution

Assume you evolve YOLOv5s on COCO128 for 2 epochs for 3 generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 3

If this is your first evolution a new directory runs/evolve/exp will be created to save your results.

# ├── yolov5
#     └── runs
#         └── evolve
#             └── exp  ← evolution saved here

Start a Second Evolution

Now assume you want to start a completely separate evolution: YOLOv5s on VOC for 5 epochs for 3 generations. You simply start evolving, and your new evolution will again be logged to a new directory runs/evolve/exp2:

python train.py --epochs 5 --data VOC.yaml --weights yolov5s.pt --evolve 3

You will now have two evolution runs saved:

# ├── yolov5
#     └── runs
#         └── evolve
#             ├── exp  ← first evolution (COCO128)
#             └── exp2  ← second evolution (VOC)

Notebook example: Open In Colab Open In Kaggle

Screenshot 2021-09-15 at 12 23 13

Resume an Evolution

If you want to resume the first evolution (COCO128 saved to runs/evolve/exp), then you use the same exact command you started with plus --resume --name exp, passing the additional number of generations you want, i.e. --evolve 30 for 30 more generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 30 --resume --name exp

Evolution will run for an additional 30 generations and all new results will be added to the existing runs/evolve/exp/evolve.csv.

Good luck and let us know if you have any other questions!

myasser63 commented 2 years ago

@glenn-jocher To perform hyperparameters evolution, should I train the model first and use the trained weights (best.py) to perform the evolution?

glenn-jocher commented 2 years ago

@myasser63 you can evolve any scenario you want. Only you know what scenario you are interested in, I can't tell you that.

myasser63 commented 2 years ago

@glenn-jocher Can I know the difference between the --evolve and hyperparameters sweep?. Is the sweep done on the runs of the evolution.

May you add more instructions for W&B sweeps?

glenn-jocher commented 2 years ago

@myasser63 the two are very different, especially in regards to the Genetic Evolution algorithm they employ. I wrote the YOLOv5 hyperparameter evolution algorithm, W&B sweeps is a more general tool developed by W&B. @AyushExel @myasser63 is requesting we add additional content or links to this tutorial for W&B sweeps. Can you review the W&B content above and see if it needs updating?

pranathlcp commented 2 years ago

@pranathlcp 👋 Hello! Thanks for asking about resuming evolution.

Resuming YOLOv5 🚀 evolution is a bit different than resuming a normal training run with python train.py --resume. If you started an evolution run which was interrupted, or finished normally, and you would like to continue for additional generations where you left off, then you pass --resume and specify the --name of the evolution you want to resume, i.e.:

Start Evolution

Assume you evolve YOLOv5s on COCO128 for 2 epochs for 3 generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 3

If this is your first evolution a new directory runs/evolve/exp will be created to save your results.

# ├── yolov5
#     └── runs
#         └── evolve
#             └── exp  ← evolution saved here

Start a Second Evolution

Now assume you want to start a completely separate evolution: YOLOv5s on VOC for 5 epochs for 3 generations. You simply start evolving, and your new evolution will again be logged to a new directory runs/evolve/exp2:

python train.py --epochs 5 --data VOC.yaml --weights yolov5s.pt --evolve 3

You will now have two evolution runs saved:

# ├── yolov5
#     └── runs
#         └── evolve
#             ├── exp  ← first evolution (COCO128)
#             └── exp2  ← second evolution (VOC)

Notebook example: Open In Colab Open In Kaggle Screenshot 2021-09-15 at 12 23 13

Resume an Evolution

If you want to resume the first evolution (COCO128 saved to runs/evolve/exp), then you use the same exact command you started with plus --resume --name exp, passing the additional number of generations you want, i.e. --evolve 30 for 30 more generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 30 --resume --name exp

Evolution will run for an additional 30 generations and all new results will be added to the existing runs/evolve/exp/evolve.csv.

Good luck and let us know if you have any other questions!

Thank you very much for the detailed reply. I actually waited until the completion of the evolution to respond with my results. The resuming approach which you mentioned, worked perfectly and I could finally have a completed evolution. I have two questions though.

  1. I have got hyp_evolve.yaml and hyp.yaml, but both of them have the same hyperparameter values except the following commented-out lines.
# YOLOv5 Hyperparameter Evolution Results
# Best generation: 251
# Last generation: 301
#    metrics/precision,       metrics/recall,      metrics/mAP_0.5, metrics/mAP_0.5:0.95,         val/box_loss,         val/obj_loss,         val/cls_loss
#              0.73573,              0.54952,              0.69578,              0.58686,             0.022727,            0.0037956,             0.050243

What exactly is the difference between hyp_evolve.yaml and hyp.yaml ?

At the end of the evolution run, it is instructed to use hyp_evolve.yaml though.

hyp hyp_evolve

  1. The other question is, in the final evolve.png plots, the values given for hyperparameters, are different from the values given in the hyp_evolve.yaml. I was under the impression that the evolve.png provides the best set of hyperparameter values based on the evolution run.

Should we use the hyperparameter values from hyp_evolve.yaml or hyp.yaml or evolve.png?

(In my case though, the values of both hyp_evolve.yaml and hyp.yaml are same)

evolve

AyushExel commented 2 years ago

@myasser63 Responded to your other issue with more links. @glenn-jocher the sweeps tutorial is up-to-date. In the second point, the path utils/wandb_logging/sweep.yaml needs to be changed to utils/logging/wandb/sweep.yaml

glenn-jocher commented 2 years ago

@AyushExel thanks, I've updated second point now to correct path!

@pranathlcp if you believe you have a reproducible bug please raise a new bug report issue, thank you!

yizweithree commented 2 years ago

If the evolution process is interrupted, how to continue to evolve?

glenn-jocher commented 2 years ago

@yizweithree 👋 Hello! Thanks for asking about resuming evolution.

Resuming YOLOv5 🚀 evolution is a bit different than resuming a normal training run with python train.py --resume. If you started an evolution run which was interrupted, or finished normally, and you would like to continue for additional generations where you left off, then you pass --resume and specify the --name of the evolution you want to resume, i.e.:

Start Evolution

Assume you evolve YOLOv5s on COCO128 for 2 epochs for 3 generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 3

If this is your first evolution a new directory runs/evolve/exp will be created to save your results.

# ├── yolov5
#     └── runs
#         └── evolve
#             └── exp  ← evolution saved here

Start a Second Evolution

Now assume you want to start a completely separate evolution: YOLOv5s on VOC for 5 epochs for 3 generations. You simply start evolving, and your new evolution will again be logged to a new directory runs/evolve/exp2:

python train.py --epochs 5 --data VOC.yaml --weights yolov5s.pt --evolve 3

You will now have two evolution runs saved:

# ├── yolov5
#     └── runs
#         └── evolve
#             ├── exp  ← first evolution (COCO128)
#             └── exp2  ← second evolution (VOC)

Notebook example: Open In Colab Open In Kaggle

Screenshot 2021-09-15 at 12 23 13

Resume an Evolution

If you want to resume the first evolution (COCO128 saved to runs/evolve/exp), then you use the same exact command you started with plus --resume --name exp, passing the additional number of generations you want, i.e. --evolve 30 for 30 more generations:

python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --evolve 30 --resume --name exp

Evolution will run for an additional 30 generations and all new results will be added to the existing runs/evolve/exp/evolve.csv.

Good luck and let us know if you have any other questions!

thsnhtung commented 2 years ago

what is the specific name of the genetic algorithm ? Is it differential evolution or cross entropy method or something else ?

glenn-jocher commented 2 years ago

@thsnhtung we use GA with gaussian mutation and elitism, no crossover, population size 1. I wrote it myself but I didn't name it. This same algorithm is also applied in AutoAnchor for anchor evolution. The details are here: https://github.com/ultralytics/yolov5/blob/7473f0f95dbc9ef9dd1706274906c99eac2ee2f9/train.py#L570-L606

thsnhtung commented 2 years ago

@thsnhtung we use GA with gaussian mutation and elitism, no crossover, population size 1. I wrote it myself but I didn't name it. This same algorithm is also applied in AutoAnchor for anchor evolution. The details are here:

https://github.com/ultralytics/yolov5/blob/7473f0f95dbc9ef9dd1706274906c99eac2ee2f9/train.py#L570-L606

Thanks for your reply but I have a little problem on how to disable the hyperparameter. I used hyp.scratch.yaml but it automatically disable shear, perspective, flipud...

glenn-jocher commented 2 years ago

@thsnhtung you can prevent a hyperparameter from evolving during hyperparameter evolution by updating it's key in the meta dictionary in train.py: https://github.com/ultralytics/yolov5/blob/540ef0dd30be9bcf6882c9625c49f61c5c764f52/train.py#L529-L559

mayukhberkeley commented 2 years ago

I am using the following python train.py --img 512 --batch 32 --epochs 10 --data {yolov5_data}/data.yaml --cfg {yolov5_model}/models/custom_yolov5s.yaml --weights yolov5s.pt --name yolov5s_results

as per the documentation, it should use the 'data/hyps/hyp.finetune.yaml' file for the hyper parameters. I however noticed another hyp.yaml file in the runs/train/yolov5s_results folder which has totally different values from the hyp.finetune.yaml file. Is the model using the 'hyp.yaml' in the results folder?

glenn-jocher commented 2 years ago

@mayukhberkeley --hyp argument is here: https://github.com/ultralytics/yolov5/blob/c2523be634a94da2b1b2a43c11b25827a0de990d/train.py#L445

thsnhtung commented 2 years ago

@thsnhtung you can prevent a hyperparameter from evolving during hyperparameter evolution by updating it's key in the meta dictionary in train.py:

https://github.com/ultralytics/yolov5/blob/540ef0dd30be9bcf6882c9625c49f61c5c764f52/train.py#L529-L559

I know we need to change meta dictionary in train.py. I wonder how to do so. like changing the upper limit = lower limit...

glenn-jocher commented 2 years ago

@thsnhtung setting mutation scale to 0 prevents a value from changing.

mayukhberkeley commented 2 years ago

@mayukhberkeley --hyp argument is here:

https://github.com/ultralytics/yolov5/blob/c2523be634a94da2b1b2a43c11b25827a0de990d/train.py#L445

@glenn-jocher you message here https://docs.ultralytics.com/yolov5/tutorials/hyperparameter_evolution#issuecomment-680685682 says that

"data/hyp.finetune.yaml will be automatically used by python train.py --weights yolov5s.pt"

My question was since I was using --weights yolov5s.pt, should it not have used data/hyp.finetune.yaml ?

glenn-jocher commented 2 years ago

@mayukhberkeley https://docs.ultralytics.com/yolov5/tutorials/hyperparameter_evolution#issuecomment-680685682 was out of date, have updated now.

Marco-Nguyen commented 2 years ago

Hi, I am trying to use evolve for my custom dataset on colab with this line of code: !python train.py --img 416 --batch 16 --epochs 10 --data {dataset.location}/data.yaml --weights yolov5s.pt --cache --evolve 5

And it gives the error on the first epoch: 0% 0/1410 [00:00<?, ?it/s]src/tcmalloc.cc:283] Attempt to free invalid pointer 0x3d5436903d7c68ec

How can I fix this?