HYPERPARAMETER EVOLUTION

glenn-jocher commented 5 years ago

Training hyperparameters in this repo are defined in train.py, including augmentation settings: https://github.com/ultralytics/yolov3/blob/df4f25e610bc31af3ba458dce4e569bb49174745/train.py#L35-L54

We began with darknet defaults before evolving the values using the result of our hyp evolution code:

python3 train.py --data data/coco.data --weights '' --img-size 320 --epochs 1 --batch-size 64 -- accumulate 1 --evolve

The process is simple: for each new generation, the prior generation with the highest fitness (out of all previous generations) is selected for mutation. All parameters are mutated simultaneously based on a normal distribution with about 20% 1-sigma: https://github.com/ultralytics/yolov3/blob/df4f25e610bc31af3ba458dce4e569bb49174745/train.py#L390-L396

Fitness is defined as a weighted mAP and F1 combination at the end of epoch 0, under the assumption that better epoch 0 results correlate to better final results, which may or may not be true. https://github.com/ultralytics/yolov3/blob/bd924576048af29de0a48d4bb55bbe24e09537a6/utils/utils.py#L605-L608

An example snapshot of the results are here. Fitness is on the y axis (higher is better). from utils.utils import *; plot_evolution_results(hyp) evolve

glenn-jocher commented 4 years ago

@logic03 you're probably going to want a larger dataset. There's not much point evolving hyperparameters to a tiny dataset.

logic03 commented 4 years ago

@logic03 you're probably going to want a larger dataset. There's not much point evolving hyperparameters to a tiny dataset.

But I only have one category of pedestrians.At the beginning, I used INRIA pedestrian data set (only 614 images in the training set), and I found that the mAP calculated could reach 0.9. However, when I detected other pedestrian videos, there were many omissions, which I thought was because the data set was too small and the generalization ability was not strong enough.Later, I used the Hong Kong Chinese pedestrian data set, and the mAP was only 0.8, but it worked well in detecting my own pedestrian video.So I tried to put the two data sets together, and the mAP was 0.85, which was not as good as the second case when detecting the video.So I wonder if it has something to do with the super parameters, because when I trained the test before, the super parameters were the same default initial value.Have you ever encountered this situation?

glenn-jocher commented 4 years ago

@logic03 well, for generalizability, the larger and more varied the dataset the better. On the plus side, if your dataset is small, training and evolving will be faster.

logic03 commented 4 years ago

So what is the approximate number of times I'm going to evolve when I use the hyperparametric evolution mechanism?

logic03 commented 4 years ago

@logic03 well, for generalizability, the larger and more varied the dataset the better. On the plus side, if your dataset is small, training and evolving will be faster.

May I ask if this hyperparametric evolution mechanism is based on the strategy proposed in this paper? 《Population Based Training of Neural Networks》 Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu (Submitted on 27 Nov 2017 (v1), last revised 28 Nov 2017 (this version, v2))

glenn-jocher commented 4 years ago

@logic03 it's a simple mutation-based genetic algorithm: https://en.wikipedia.org/wiki/Genetic_algorithm

300 generations is a rough number to try to run, though of course if you see no changes after a certain point it's likely settled into a minimum.

aclapes commented 4 years ago

Hi, I wanted to evolve the hyperparameters after getting rid of hsv_* in the hyp dictionary (they don't make sense in my non-RGB data). I was looking into train.py and found this line:

hyp[k] = x[i + 7] * v[i] # mutate

Is this 7 a magic number so hyp and x elements do match?

EDIT: Nevermind, I'm guessing now it is because x is a line read from evolve.txt and its first 6 values correspond to some extra metainfo other than hyperparameters.

glenn-jocher commented 4 years ago

@aclapes yes that's correct! There are 7 values (3 losses and 4 metrics) followed by the hyperparameter vector in each row of evolve.txt.

PR's are welcome, thank you!

joey1616 commented 4 years ago

Hello, i wanted to evolve my hyp, in python3 train.py --data data/coco.data --weights '' --img-size 320 --epochs 1 --batch-size 64 -- accumulate 1 --evolve, which epochs in should set?

joey1616 commented 4 years ago

And how can i use xy and wh loss?

glenn-jocher commented 4 years ago

@joey1616 the more --epochs the more your results will correlate to full training results, but of course the longer everything will take also. There is no right answer, but we've been using about 10% of full training as a good proxy for full training results.

If you look at most training results you will see a corner in the mAP. Before this corner the results can be pretty uncertain and don't mean quite as much, but in general once mAP turns this corner its a fairly straight slope for the rest of training, which means your results there are a very good indicator of your final results. For the coco trainings below this happens around 50-60 epochs. results

glenn-jocher commented 4 years ago

@joey1616 oh and individual xywh losses are deprecated. GIoU is a single loss that regressses the bounding boxes.

glenn-jocher commented 4 years ago

I noticed there is another type of corner also, that shows up earlier, at 8 or 9 epochs. Perhaps for COCO we could get away with evolving only to this level. That would save a lot of time compared to 30 or 60 epochs... Figure_1

joey1616 commented 4 years ago

Thank you very much and my cls loss is very small,it is about 2.19e-07 in evolve when map@0.5 =0.175.Can i use this cls loss in my train?

glenn-jocher commented 4 years ago

@joey1616 I don't understand your question. cls loss should not normally be that small.

joey1616 commented 4 years ago

@joey1616 I don't understand your question. cls loss should not normally be that small.

Sorry,i mean that my cls loss gain is about 2.19e-07，and it was generated in evolution.It's too small so I don't know if this parameter should be used.

glenn-jocher commented 4 years ago

If it produces a high mAP then use it, if not don't.

TaoXieSZ commented 4 years ago

Hi, I set the epoch as 20 and evolve for 10 times. I run like this

And then I get result as: I wonder how to pick the best hyper-parameters. Thank you very much.

glenn-jocher commented 4 years ago

@ChristopherSTAN blue dot is your best result. It will be the first row in evolve.txt.

zarraozaga commented 4 years ago

Hi, I know this question is repetitive. But I need some assistance. How do these parameters in darknet framework map with ultralytics hyperparameters? From the above, this is the information I gathered. Please do correct me. I need help with the parameters indicated as "?" Thank you so much!

darknet: ultralytics

batch: batch-size(input_param)
subdivisions: ?
width: img-size(input_param)
height: img-size(input_param)
channels: ?
momentum: momentum
decay: weight_decay
angle: degrees
saturation: hsv_s
exposure: ?
hue: hsv_h

learning_rate: lr_0
burn_in: (use lr_1)
max_batches: ?
policy: ?
steps: ?
scales: scale

Screenshot from 2020-04-14 12-46-47

Screenshot from 2020-04-14 12-45-27

taylan24 commented 4 years ago

Hello, which 'evolve.txt' file should we use to get higher mAP? can we create our own file?

glenn-jocher commented 4 years ago

@taylan24 evolve.txt files are created during hyperparameter evolution, they store the results of each generation.

himikox commented 4 years ago

Hello, are the values of the best result from the same combination?

ankitahumne commented 4 years ago

I am trying to tune the hyper parameters for my custom data set. Using the evolve option, I get the following results. However, the fitness is too small and the precision and recall are always zero. Can you tell me any way to increase the fitness (and mAP) evolve

glenn-jocher commented 4 years ago

@ankitahumne evolution always starts from a beginning state. You should ensure your beginning state shows acceptable performance before starting the process. In other words, garbage in, garbage out.

ankitahumne commented 4 years ago

@glenn-jocher Thanks, that would save a lot of time. But to arrive at decent hyper parameters, trail and error is the only way? Also, my precision and recall are always zero. It doesnt seem normal. Any hints to improve that?

glenn-jocher commented 4 years ago

@ankitahumne these are two separate topics. Before doing any evolution you need to train normally with default settings and make sure you see at least a minimum level of performance. If not you have a problem.

ankitahumne commented 4 years ago

@glenn-jocher I did try with the default settings and that was pretty bad, so I am using --evolve to tune the hyperparameters. The results are slightly better in terms of P,R,mAP and F1. Looks like that's the way to proceed then - keep tuning hyperparameters. I tried a mix of changing hyp, multiscale, conf-thres, iou-thresh .

leviethung2103 commented 4 years ago

@glenn-jocher:

I would like to ask you about the purpose of this topic.

First of all, the default hypeparameters are set inside the train.py file.

If we want to tune the hyperparameters efficiently for our custom dataset. We need to specify the option --evolve ? And then the hyperameters are automatically adjusted during the training? Correct me if I am wrong. Thank you so much.

glenn-jocher commented 4 years ago

@leviethung2103 hyps are static while training. --evolve with evolve them over many trainings.

patagona-snayyer commented 4 years ago

This is the result on training on a custom data with evolve. The values I get at the end of evolution are the same as the start. Is this normal? Doesn't look so to me :/ @glenn-jocher evolve

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

watertianyi commented 4 years ago

@glenn-jocher Two video cards 16g, epochs set to 300, evolve iteration to 200, the result exceeds the GPU memory, how to solve this?

watertianyi commented 4 years ago

@glenn-jocher epochs set to 30, evolve iteration to 200, The following error：

glenn-jocher commented 4 years ago

Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.

August 13, 2020: v3.0 release: nn.Hardswish() activations, data autodownload, native AMP.
July 23, 2020: v2.0 release: improved model definition, training and mAP.
June 22, 2020: PANet updates: new heads, reduced parameters, improved speed and mAP 364fcfd.
June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
June 9, 2020: CSP updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
May 27, 2020: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
April 1, 2020: Start development of future compound-scaled YOLOv3/YOLOv4-based PyTorch models.

Pretrained Checkpoints

Model	AP^val	AP^test	AP₅₀	Speed_GPU	FPS_GPU		params
YOLOv5s	37.0	37.0	56.2	2.4ms	416	7.5M	13.2B
YOLOv5m	44.3	44.3	63.2	3.4ms	294	21.8M	39.4B
YOLOv5l	47.7	47.7	66.5	4.4ms	227	47.8M	88.1B
YOLOv5x	49.2	49.2	67.7	6.9ms	145	89.0M	166.4B

YOLOv5x + TTA	50.8	50.8	68.9	25.5ms	39	89.0M	354.3B

YOLOv3-SPP	45.6	45.5	65.2	4.5ms	222	63.0M	118.0B

AP^test denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.001
Speed_GPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.1
All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation). Test Time Augmentation (TTA) runs at 3 image sizes. Reproduce** by python test.py --data coco.yaml --img 832 --augment

For more information and to get started with YOLOv5 please visit https://github.com/ultralytics/yolov5. Thank you!

watertianyi commented 4 years ago

@glenn-jocher How to use the parameters of evolve.txt in training?

glenn-jocher commented 4 years ago

@hande6688 see the yolov5 tutorials. A new hyp evolution yaml is created that you can point to when training yolov5: python train.py --hyp hyp.evolve.yaml

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nanhui69 commented 3 years ago

@glenn-jocher why the result of me about evolve is like the nayyersaahil28 users, and i had been modified code like fig1-2 , evolve

glenn-jocher commented 3 years ago

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED
Weights & Biases Logging 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
ONNX and TorchScript Export
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution < ---------------------------
Transfer Learning with Frozen Layers ⭐ NEW
TensorRT Deployment

nanhui69 commented 3 years ago

@glenn-jocher but i have chaged the code "for _ in range(265): # generations to evolve---------------------------" ? what's problem?

nanhui69 commented 3 years ago

@glenn-jocher python3 train.py --epochs 30 --cache-images --evolve & ? should i need specify the my custom weight ,not the pretrainweight?

glenn-jocher commented 3 years ago

@nanhui69 you can evolve any base scenario, including starting from any pretrained weights. There are no constraints.

nanhui69 commented 3 years ago

@nanhui69 you can evolve any base scenario, including starting from any pretrained weights. There are no constraints. i only want to solve the problem as the evolve‘s png shows above -- only one point ? what should i do?

glenn-jocher commented 3 years ago

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED
Weights & Biases Logging 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
ONNX and TorchScript Export
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution < ---------------------------
Transfer Learning with Frozen Layers ⭐ NEW
TensorRT Deployment

nanhui69 commented 3 years ago

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED

Weights & Biases Logging 🌟 NEW

Multi-GPU Training

PyTorch Hub ⭐ NEW

ONNX and TorchScript Export

Test-Time Augmentation (TTA)

Model Ensembling

Model Pruning/Sparsity

Hyperparameter Evolution < ---------------------------

Transfer Learning with Frozen Layers ⭐ NEW

TensorRT Deployment

oh, i will check it again。。。。。

nanhui69 commented 3 years ago

@nanhui69 your plots look like this because you have only evolved 1 generation. For the best performing and most recent evolution I would recommend the YOLOv5 hyperparameter evolution tutorial, where the default code evolves 300 generations:

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED

Weights & Biases Logging 🌟 NEW

Multi-GPU Training

PyTorch Hub ⭐ NEW

ONNX and TorchScript Export

Test-Time Augmentation (TTA)

Model Ensembling

Model Pruning/Sparsity

Hyperparameter Evolution < ---------------------------

Transfer Learning with Frozen Layers ⭐ NEW

TensorRT Deployment

when i follow yolov5 steps ,the evolve.txt have only one line? what's that?

glenn-jocher commented 3 years ago

@nanhui69 YOLOv5 evolution will create an evolve.txt file with 300 lines, one for each generation.

nanhui69 commented 3 years ago

@nanhui69 YOLOv5 evolution will create an evolve.txt file with 300 lines, one for each generation.

so ,what's wrong with me ? and when evolve.txt exists ,to run evolve style, error appears? hyp[k] = x[i + 7] * v[i] # mutate IndexError: index 18 is out of bounds for axis 0 with size 18

ultralytics / yolov3