Closed yang-jin-hai closed 5 years ago
@WannaSeaU why would you stop training at 40 epochs if you don't like your results then? Full training takes 273 epochs with default settings.
See https://github.com/ultralytics/yolov3/issues/214 for mAP thresholds.
@WannaSeaU do you have an update on this or can we close the issue?
You can close the issue,thanks!
I’m trying to improve the confidence by training more epochs with responding modified lr_sheduler. However, although the train loss and confidence drop with epochs, the test loss doesn’t improve after about 40epochs. It’s getting overfitted.
The P, R, mAP, F1-score are quite good though, I’m trying to fine tune the hyps.
Well I would say as long as the mAP and F1 continue to improve you should continue training, because the mAP and F1 are computed on the test test, not the training set.
We wrote new code to tune the hyperparameters using a genetic evolution algorithm. You can access this from train.py by using the --evolve
argparser argument. It will run your scenario 50 times, evolving the hyperparameters from their initial setting each generation, creating mutation from successful offspring, saving the results in evolve.txt
.
@WannaSeaU also beware that before the LR scheduler was hardcoded to step down at epochs 218, 245, whereas now the inverse exp curve is tuned to the final epoch, so the curve shape stays the same and it bends back and forth depending on how many epochs you called for. This means that before if you trained to 40 your final lr as 0.001, too high, but now if you train to 40 your final lr will be produced by hyp['lrf']
.
All in all though I don't think this LR scheduling makes a huge difference. BUT, in all of the examples I've seen you really need to train to almost 300 epochs to get the best results. 40 is much too early. Can you post your training results here? Use from utils import utils; utils.plot_results()
I'm sorry that I forgot to point out that I have trained for the whole 273 epochs, even more, I train for 500 epochs for the purpose of improving confidence
.
Besides, I noticed the recent changes of your codes, and the code on my use is cloned on 4.28, which includes the inverse exp curve LR schedule.
Also I noticed you have added codes to tune the hyperparameters using a genetic evolution algorithm, but I'm afraid I don't have enough computing power now. (XD)
In total the phenomenon I think is weired is:
confidence
isn't high enough(in your examples it can be nearly 1), while the P/R/mAP/F1 is not bad. confidence = Pr(Object) * IOU
in yolo, which I think is high if P/R/mAP/F1 could reach a nice value.The training is getting ovefitted actually, the results after 273 epochs are shown as follows:
Some sample inference pics are shown below:
It suddenly occurs to me that, maybe the problem is in the dataset?
Haha, interesting. It looks like its working really well! But yes, it seems 40-50 epochs is enough for your application.
You should check the class confidence coming out of the inference model, as it currently shows the obj conf times the class conf. Since you have one class you are really misusing the yolov3 structure, so I don't know what to expect (ideally you'd merge conf with cls_conf). Try replacing this line in detect.py, I'm assuming these should all be 1.0:
# Add bbox to the image
label = '%s %.2f' % (classes[int(cls)], conf)
with this:
# Add bbox to the image
label = '%s %.2f' % (classes[int(cls)], cls_conf)
I have modified the related configs as you wrote in the tutorial for single class training.
It turn out that cls_conf is not 1.0:
With --conf-thres 0.001 --weights weights/latest.pt
:
Before replacing:
After replacing:
What's more, the inference performance of 'best.pt' is shown as follows:
With --conf-thres 0.001 --weights weights/best.pt
:
(Uhmm……in a mess)
With --conf-thres 0.01 --weights weights/best.pt
:
With --conf-thres 0.1 --weights weights/best.pt
:
With --conf-thres 0.5 --weights weights/best.pt
:
It's default set in detect.py which suffers from low confidence in my application.
And this is a sample inference picture in CityPersons dataset, where the confidence can be nearly 1.0. This is completed on the earlier version of this repo.
@WannaSeaU Ok hold on, you are bringing up a few different topics here. The first thing to remember is that you don't want to detect at 0.001 conf_thres, this is only for mAP computation, with detection typically set around 0.1 to 0.9 depending on your preference for Recall vs Precision.
The test seems to show that cls_conf is around 0.5 for all of your targets, and obj_conf would then be 1.0. I'm not sure why cls_conf would be 0.5 since there is only one class, but this means that all of the people are being detected with an object_confidence of about 0.99. The images confidences show the two multiplied togethor.
@glenn-jocher I'll try to solve it myself, since I beilieve it's my own stuff. Thanks for your kindness.
@glenn-jocher
Hi bro, I got something new to tell you. I have figured out why the cls_conf is low. In #239, the mAP is high while the cls_conf is also low, I think many person who train on single class don't realize this issue.
The activation function is sigmoid:
https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/models.py#L165
For cls loss, the metric function is CrossEntropy(), which includes a log softmax function.
https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/utils/utils.py#L280
Anyway, it's not even a problem for multi-class situation. But for one-class training, the cls loss calculated by CE will always be 0, which means bad classification score will not influence the total score. Therefore networks don't learn to give a high classification score when a person is detected.
In fact, I think sigmoid
may need negative samples for better performance, which is unsatisfied in single class scene.
My solution for this is, use softmax instead of sigmoid for inference. Or else, simplely remove this line in nms
.
https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/utils/utils.py#L362
You may consider updating the tutorial for this problem. In addtion, I recalculated the anchors for custom dataset and got a performance boost. You can consider add this into tutorials too.
@WannaSeaU I think you are right!!! This might be a major bug you discovered. I'll look at this right now.
@WannaSeaU I confirmed you should be getting much better confidences by using our iDetection app to image this issue thread. I think you are absolutely right though that CELoss should use logits input (like BCEWithLogitsLoss)
@WannaSeaU ah no, everything is ok for training with CELoss, because the section you viewed only applies torch.sigmoid()
when in inference mode, not in training mode. When in training mode the network outputs go straight to the loss function: https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/models.py#L129-L133
And in the loss function only the xy
outputs are sigmoided:
https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/utils/utils.py#L272-L285
@WannaSeaU Ok, I understand what's going on now. If you just have a single class, CELoss()
notices this, and outputs a zero loss. Since the loss is always zero, the cls output neurons never adjust one way or another, they simply stay the same as their random initialisations. And randomly initialized the model will output a mean around zero with a small standard deviation, which turns into 0.5 after passing through torch.sigmoid()
. So what we need to do is add an if statement to catch single-class inference, and ignore the cls
term. I've added this in lines 168-169 in commit 09ee7b6f115d96a377558492e05f7ef703d1c390.
This makes logical sense as well, as the cls
confidence is redundant in single class systems and should be ignored. All of the detection work is carried out instead by the object confidence output conf
.
@glenn-jocher Yes, that's exactly what I wanted to tell you. Also thank you for that you explained why the cls_conf is around 0.5, which I didn't realize.
By the way, I'm using the pre-trained weights yolov3-spp.pt
. Maybe the neurons are not randomly initialized in this case? Why the output is also about 0.5?
@WannaSeaU I don't understand. You used yolov3-spp.pt to start training, but you had only 1 class? This checkpoint has 80 classes.
@glenn-jocher Sorry I got it wrong. It's darknet53.conv.74
. Is this randomly initialized?
@WannaSeaU the pytorch models are always randomly initialized, and then certain layers may be optionally replaced later, such as when loading checkpoints. darknet53.conv.74
is a pretrained backbone for the first 75 layers of YOLOv3. Layers above 75 stay randomly intialized.
@WannaSeaU
I use VOC+COCO datasets to train people in this category, which can only reach 0.74(map). I would like to ask what datasets do you use to train?
It was done on a custom dataset, where images are from monitor videos.
---Original--- From: "www12345678"notifications@github.com Date: Wed, Aug 14, 2019 17:02 PM To: "ultralytics/yolov3"yolov3@noreply.github.com; Cc: "Mention"mention@noreply.github.com;"WannaSeaU"1473628258@QQ.COM; Subject: Re: [ultralytics/yolov3] Low confidence and something about conf_thres (#235)
@WannaSeaU I use VOC+COCO datasets to train people in this category, which can only reach 0.74(map). I would like to ask what datasets do you use to train?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@www12345678 the dataset has undergone significant change recently, current training result is within 1% of darknet, so if you don´t have the latest you may want to git pull and retrain. See https://github.com/ultralytics/yolov3/issues/310.
@glenn-jocher I used the latest version to train, and this is my visualize result.(I'm training voc+coco datasets for one class: person) There were 70,584 training images and 4,790 valid images, so I made some changes in yolov3.cfg and train.py. I set batch=64,subdivisions=8,max_batches = 50200,steps=40000,45000 in yolov3.cfg. I also set epochs=46,batch_size = 8 in train.py. No other parameters have changed. I have a request about visualization results, when epochs is around 37, the learning rate decreases by 10 times and MAP greatly improves, how can I adjust my learning rate to get the optimal result.
@www12345678 looks good! This is the LR scheduler: https://github.com/ultralytics/yolov3/blob/321bd957647ff68a676a6ac2e37d50ff96ece4ef/train.py#L155
Note that we do not read any training settings from the cfg file, we only read in the model architecture.
@www12345678 BTW, batch-size 8 is too small. You'll likely get better performance if you stick with the default settings, which are batch-size 32 accumulate 2, which give you an effective batch-size of 64.
If you run out of memory the equivalent is batch-size 16 accumulate 4, which also gives you effective batch-size 64.
@glenn-jocher
Thank you very much. I will continue to try
@glenn-jocher @WannaSeaU I am going through something similar , training yolov2 (training from sratch) for a two class dataset (face and non face). though the training loss is going to 0.06, when i test the model of validation set , the results are not that good . doesnt detect face half the time. The problem is with the combined confidence score . the class score is always 1.0 but the confidence score is way too less (0.1- 0.2) for many instances with just one face in it. The loss is calculate by taking sigmoid of confidence and then MSE. For class loss cross entropy loss is taken . Note : the dataset i am using has only 4 instance of no face class. else 4996 instances have atleast one face in it. Do you guys think this might be the problem that i dont have enough non face instances thanks,
yolov2? To answer the second part that should be a significant problem.
@FranciscoReveriano thanks for the reply , yes i have build yolov2 from scratch and i am training it on a dataset for a face and non face object detection problem . just to clarify , by second part do u mean less non face instances in the dataset?
@agarwalyogeesh Hello, and thank you for your interest in our work! After reviewing your question we believe that this issue falls outside of the scope of this repository, which is limited to YOLOv3 PyTorch and ONNX model training, inference and deployment.
We suggest you raise the issue directly under the package or source causing the problem.
@glenn-jocher okay , thanks for ur time. i will ask on other platforms.
Hello @glenn-jocher thank you for your great work. I have dealt with a similar problem as shown in here in the meanwhile I have also found the problem. I'm not sure if the problem is still present in the latest version of this repo, but I'd still like to share my experience, maybe it helps someone. I downloaded your repo at the beginning of February 2020 and since then I have adapted it to my needs and therefore I have not updated to the latest version. Your idea, which you have explained here is absolutely correct, but the problem is, if we use the sigmoid function in non_maximumsuppression, we calculate sigmoid of 1, which is about 0.73106. Just print "pred[..., 5:]" after this line:
`torch.sigmoid(pred[..., 5:]) Then we compute:
pred[..., 5:] = pred[..., 4:5]`
but due to the sigmoid, we have 0.73106 conf, and not 1 * conf (as expected). I fixed this, by computing the number of classes in non_maximum_supression with:
num_classes = len(pred[-1]) - 5
and if num_classes == 1
then we copy the confidence values to the class score entries. Which means:
if len(pred[-1]) - 5 == 1:
pred[..., 5] = pred[..., 4]
else:
torch.sigmoid_(pred[..., 5:])
pred[..., 5:] *= pred[..., 4:5] # conf = obj_conf * cls_conf
Hi, @glenn-jocher I meet the same problem of low confident at master branch? But I can't find where to fix the code, can someone help me? Thx!
@a227799770055 YOLOv3 is due for an update soon. In the meantime I would recommend YOLOv5 for all new projects: https://github.com/ultralytics/yolov5
Same issue here and my solution: https://github.com/ultralytics/yolov3/issues/781#issuecomment-1274633020
@GulerEnes thank you for sharing your solution! We appreciate your contribution to the YOLO community. If you have any further questions or need assistance, feel free to ask.
I'm training my own dataset for one class, and seems to get a good reported result after only 40 epochs. The printed imformation during train phase is:
However, with default arguments, the results of test.py don't looks this good, and detect.py only show a littile detections.
I noticed that this phenomenon may be caused by different conf-thres configment in the three codes, which is 0.1 in train.py, 0.001 in test.py, 0.5 in detect.py.
This different may cause:
Besides, I also noticed that inference of my model has low confidence, what can I do to improve this? Is simply resume training this model okay? Or I should modify the loss hyps?
BTW, I also also noticed that the 1st batch in either train phase or detect phase costs far more time than others, is this regular?