Closed tianfengyijiu closed 4 years ago
@tianfengyijiu the repos are different, so if darknet works better for your dataset then you should do your training there.
For best results here I recommend you use all default settings and yolov3-spp.cfg.
@glenn-jocher Thanks! Can you expand the reason? How can I adjust hyp all same to darkent and will I get the same results with darknet?
@tianfengyijiu there's many small differences, so there's no simple change you can make. Like I said though, the best results here will be from training using all default settings, starting from the default pretrained weights. You can add -multi-scale as well, as this is how we trained COCO. See https://github.com/ultralytics/yolov3#reproduce-our-results and use this exact training command for your dataset.
OK, I will try.
@tianfengyijiu one option is that if your dataset is small, you may want to use pretrained
--weights yolov3-spp-ultrlaytics.pt
, and set batchnorm momentum to 0.1 in models.py.
@glenn-jocher Hi, thanks for your help, I use the pre-trained weights to train my custom dataset, and I got mAP50 66% which better than 59%. But I want to get 69% just like Darknet. I will: Change the anchor by K-means. Change the iou_t in train.py and try again.
@tianfengyijiu ah interesting. One major difference is that darknet has multi-scale on by default. Here you need to use the --multi flag to enable it. See here for the commands to reproduce our training results: https://github.com/ultralytics/yolov3#reproduce-our-results
@glenn-jocher Great!After these steps, I got a better result: 1.pull your latest repo. 2.get new anchors using K-means. 3.use pre-trained weights:yolov3-spp-ultralytics.pt 4.set the iou_t lower,as 0.1. 5.set the BN momentum higher ,as 0.1 Training: Results: mAP@0.5=0.701 It is better than darknet mAP@0.5=0.694 Thanks very much! Can you tell the change what significant improves the result in your latest?
And more , why the best.pt(501.8MB) is big than last.pt(251.0MB), what should I take to convert in Darknet format?
@tianfengyijiu ah great! That looks pretty good, but it would probably be better if you trained longer. The LR scheduler is a cosine scheduler so it adapts to the --epochs, and the EMA may need more than 50 epochs to really integrate properly.
Lowering the iou_t may or may not help mAP, it's hard to say, you might want to try it both at 0.1 and at the default value.
Oh, and best.pt is larger than last.pt because it still has the optimizer included. You can strip the optimizer from the checkpoint by using:
from utils.utils import *; strip_optimizer('weights/best.pt')
Perhaps we should have a script to strip the optimizer automatically from best.pt when training finishes...
Ok this should strip the optimizer from best.pt after training now: 6e19245dc8dd9a16d8e48a9b9493f53384b8bbd1
You can git pull to get the update :=)
@glenn-jocher Thanks, I will try more epochs. What is EMA?
@glenn-jocher I will imp and verify the Learning Data Augmentation Strategies for Object Detection:https://arxiv.org/abs/1906.11172 on my custom dataset. If that works well, I will show the result to you.
Ah yes, autoaugment. My understanding was that autoaugment takes many thousands of GPU hours though. Is that correct?
EMA is the exponential moving average of the model. The EMA is updated every optimizer update, the decay is 0.9999, so it takes at least 10000 optimizer updates to mature the EMA.
Coco trains for 500,000 iterations for example, or 300 epochs at batch size 64.
On Thu, 9 Apr 2020 at 20:23, Fanalong notifications@github.com wrote:
@glenn-jocher https://github.com/glenn-jocher I will imp and verify the :https://arxiv.org/abs/1805.09501 on my custom dataset. If that works well, I will show the result to you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov3/issues/988#issuecomment-611858840, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMXEGMIZCM2M6FI2VBGZKDRL2GKHANCNFSM4LXGAT7Q .
Glenn JocherFounder & CEO, Ultralytics LLC +1 301 237 6695 https://www.facebook.com/ultralytics https://www.twitter.com/ultralytics https://www.youtube.com/ultralytics https://www.github.com/ultralytics https://www.linkedin.com/company/ultralytics https://www.instagram.com/ultralytics https://contact.ultralytics.com/
Yes, auto augment. But I mean is using the policy already found on COCO dataset according to this paper, rather than search the policy again, which is too expensive for me.
@tianfengyijiu ah ok I understand! Hmm, I should look at the paper then and try to implement the same policy for coco training.
https://github.com/tensorflow/tpu/blob/master/models/official/detection/utils/autoaugment_utils.py This is code in TensorFlow, I impl that in NumPy and train on custom dataset now.
Thanks! Do you know where they say exactly what the optimal augment policy they found for COCO is? I see their mAP improvement in the paper but I couldn’t find details on their specific best policy.
Yes. The policy they found on COCO dataset is : in paper(Appendix A): in code:
@tianfengyijiu great thanks! Did they use all of these subpolicies at the same time, or pick one of the 5 randomly per batch? Also, they never used more than 2 operations at a time on a batch?
The easiest way to implement this would be to translate these values into the augmentation hyperparameters we use here, which are the last 7 values in the dictionary. We have a cutout flag also in the dataloader (hardcoded to False), I've always seen worse mAPs when using it unfortunately.
The last 4 augmentation hyps are also zeroed out here because I was not able to produce better mAPs with them on.
https://github.com/ultralytics/yolov3/blob/b98ce11d3a1d5905dcacb5d7cf28c5746ed5d967/train.py#L25-L43
@glenn-jocher Thanks for your repo! I have some confusion about the EMA. Have you tested the influence of the EMA? In my opinion, the EMA is similar to computing the average of weights of the last 10000 updates, is that right? At the end of training the point we get is wobbling around the actual best. I think the effect of EMA is to calculate the average weights in order to enhanced robustness, isnt it?
@glenn-jocher Could you please give me some advice about it? I'll really appreciate it!
@ChrisLiiiii yes, EMA average the previous weights based on a decay function. It helps a lot early in training, and a bit later on as well. See https://github.com/rwightman/pytorch-image-models/issues/102#issuecomment-601424476
@glenn-jocher Thanks for your reply!
This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.
@ChrisLiiiii you're welcome! If you have any further questions or need more assistance, feel free to ask. Good luck with your training!
Thanks for your job! I have trained my custom dataset in darkent with yolov3-voc.cfg just modify learning rate and I get AP50=69%. But I use this project with default hyp, and get AP50=59%, this is too low, what wrong with me? I convert 69% darknet weights to pt format and test in this project, the results is 69% too.