ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.23k stars 3.45k forks source link

COCO AutoAugment Implementation #988

Closed tianfengyijiu closed 4 years ago

tianfengyijiu commented 4 years ago

Thanks for your job! I have trained my custom dataset in darkent with yolov3-voc.cfg just modify learning rate and I get AP50=69%. But I use this project with default hyp, and get AP50=59%, this is too low, what wrong with me? I convert 69% darknet weights to pt format and test in this project, the results is 69% too.

glenn-jocher commented 4 years ago

@tianfengyijiu the repos are different, so if darknet works better for your dataset then you should do your training there.

For best results here I recommend you use all default settings and yolov3-spp.cfg.

tianfengyijiu commented 4 years ago

@glenn-jocher Thanks! Can you expand the reason? How can I adjust hyp all same to darkent and will I get the same results with darknet?

glenn-jocher commented 4 years ago

@tianfengyijiu there's many small differences, so there's no simple change you can make. Like I said though, the best results here will be from training using all default settings, starting from the default pretrained weights. You can add -multi-scale as well, as this is how we trained COCO. See https://github.com/ultralytics/yolov3#reproduce-our-results and use this exact training command for your dataset.

tianfengyijiu commented 4 years ago

OK, I will try.

glenn-jocher commented 4 years ago

@tianfengyijiu one option is that if your dataset is small, you may want to use pretrained --weights yolov3-spp-ultrlaytics.pt, and set batchnorm momentum to 0.1 in models.py.

tianfengyijiu commented 4 years ago

@glenn-jocher Hi, thanks for your help, I use the pre-trained weights to train my custom dataset, and I got mAP50 66% which better than 59%. But I want to get 69% just like Darknet. I will: Change the anchor by K-means. Change the iou_t in train.py and try again.

glenn-jocher commented 4 years ago

@tianfengyijiu ah interesting. One major difference is that darknet has multi-scale on by default. Here you need to use the --multi flag to enable it. See here for the commands to reproduce our training results: https://github.com/ultralytics/yolov3#reproduce-our-results

tianfengyijiu commented 4 years ago

@glenn-jocher Great!After these steps, I got a better result: 1.pull your latest repo. 2.get new anchors using K-means. 3.use pre-trained weights:yolov3-spp-ultralytics.pt 4.set the iou_t lower,as 0.1. 5.set the BN momentum higher ,as 0.1 Training: image Results: mAP@0.5=0.701 It is better than darknet mAP@0.5=0.694 Thanks very much! Can you tell the change what significant improves the result in your latest?

tianfengyijiu commented 4 years ago

And more , why the best.pt(501.8MB) is big than last.pt(251.0MB), what should I take to convert in Darknet format?

glenn-jocher commented 4 years ago

@tianfengyijiu ah great! That looks pretty good, but it would probably be better if you trained longer. The LR scheduler is a cosine scheduler so it adapts to the --epochs, and the EMA may need more than 50 epochs to really integrate properly.

Lowering the iou_t may or may not help mAP, it's hard to say, you might want to try it both at 0.1 and at the default value.

glenn-jocher commented 4 years ago

Oh, and best.pt is larger than last.pt because it still has the optimizer included. You can strip the optimizer from the checkpoint by using:

from utils.utils import *; strip_optimizer('weights/best.pt')

glenn-jocher commented 4 years ago

Perhaps we should have a script to strip the optimizer automatically from best.pt when training finishes...

glenn-jocher commented 4 years ago

Ok this should strip the optimizer from best.pt after training now: 6e19245dc8dd9a16d8e48a9b9493f53384b8bbd1

You can git pull to get the update :=)

tianfengyijiu commented 4 years ago

@glenn-jocher Thanks, I will try more epochs. What is EMA?

tianfengyijiu commented 4 years ago

@glenn-jocher I will imp and verify the Learning Data Augmentation Strategies for Object Detection:https://arxiv.org/abs/1906.11172 on my custom dataset. If that works well, I will show the result to you.

glenn-jocher commented 4 years ago

Ah yes, autoaugment. My understanding was that autoaugment takes many thousands of GPU hours though. Is that correct?

EMA is the exponential moving average of the model. The EMA is updated every optimizer update, the decay is 0.9999, so it takes at least 10000 optimizer updates to mature the EMA.

Coco trains for 500,000 iterations for example, or 300 epochs at batch size 64.

On Thu, 9 Apr 2020 at 20:23, Fanalong notifications@github.com wrote:

@glenn-jocher https://github.com/glenn-jocher I will imp and verify the :https://arxiv.org/abs/1805.09501 on my custom dataset. If that works well, I will show the result to you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/988#issuecomment-611858840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMXEGMIZCM2M6FI2VBGZKDRL2GKHANCNFSM4LXGAT7Q .

-- https://www.ultralytics.com/

Glenn JocherFounder & CEO, Ultralytics LLC +1 301 237 6695 https://www.facebook.com/ultralytics https://www.twitter.com/ultralytics https://www.youtube.com/ultralytics https://www.github.com/ultralytics https://www.linkedin.com/company/ultralytics https://www.instagram.com/ultralytics https://contact.ultralytics.com/

tianfengyijiu commented 4 years ago

Yes, auto augment. But I mean is using the policy already found on COCO dataset according to this paper, rather than search the policy again, which is too expensive for me.

glenn-jocher commented 4 years ago

@tianfengyijiu ah ok I understand! Hmm, I should look at the paper then and try to implement the same policy for coco training.

tianfengyijiu commented 4 years ago

https://github.com/tensorflow/tpu/blob/master/models/official/detection/utils/autoaugment_utils.py This is code in TensorFlow, I impl that in NumPy and train on custom dataset now.

glenn-jocher commented 4 years ago

Thanks! Do you know where they say exactly what the optimal augment policy they found for COCO is? I see their mAP improvement in the paper but I couldn’t find details on their specific best policy.

tianfengyijiu commented 4 years ago

Yes. The policy they found on COCO dataset is : in paper(Appendix A): image in code: image

glenn-jocher commented 4 years ago

@tianfengyijiu great thanks! Did they use all of these subpolicies at the same time, or pick one of the 5 randomly per batch? Also, they never used more than 2 operations at a time on a batch?

The easiest way to implement this would be to translate these values into the augmentation hyperparameters we use here, which are the last 7 values in the dictionary. We have a cutout flag also in the dataloader (hardcoded to False), I've always seen worse mAPs when using it unfortunately.

The last 4 augmentation hyps are also zeroed out here because I was not able to produce better mAPs with them on.

https://github.com/ultralytics/yolov3/blob/b98ce11d3a1d5905dcacb5d7cf28c5746ed5d967/train.py#L25-L43

ChrisLiiiii commented 4 years ago

@glenn-jocher Thanks for your repo! I have some confusion about the EMA. Have you tested the influence of the EMA? In my opinion, the EMA is similar to computing the average of weights of the last 10000 updates, is that right? At the end of training the point we get is wobbling around the actual best. I think the effect of EMA is to calculate the average weights in order to enhanced robustness, isnt it?

ChrisLiiiii commented 4 years ago

@glenn-jocher Could you please give me some advice about it? I'll really appreciate it!

glenn-jocher commented 4 years ago

@ChrisLiiiii yes, EMA average the previous weights based on a decay function. It helps a lot early in training, and a bit later on as well. See https://github.com/rwightman/pytorch-image-models/issues/102#issuecomment-601424476

ChrisLiiiii commented 4 years ago

@glenn-jocher Thanks for your reply!

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

glenn-jocher commented 1 year ago

@ChrisLiiiii you're welcome! If you have any further questions or need more assistance, feel free to ask. Good luck with your training!