ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.16k stars 3.44k forks source link

Low confidence and something about conf_thres #235

Closed yang-jin-hai closed 5 years ago

yang-jin-hai commented 5 years ago

I'm training my own dataset for one class, and seems to get a good reported result after only 40 epochs. The printed imformation during train phase is:

              Class    Images   Targets         P         R       mAP        F1
                 all  2.40e+03   6.9e+03     0.893     0.863      0.84     0.878

However, with default arguments, the results of test.py don't looks this good, and detect.py only show a littile detections.

I noticed that this phenomenon may be caused by different conf-thres configment in the three codes, which is 0.1 in train.py, 0.001 in test.py, 0.5 in detect.py.

This different may cause:

Besides, I also noticed that inference of my model has low confidence, what can I do to improve this? Is simply resume training this model okay? Or I should modify the loss hyps?

BTW, I also also noticed that the 1st batch in either train phase or detect phase costs far more time than others, is this regular?

glenn-jocher commented 5 years ago

@WannaSeaU why would you stop training at 40 epochs if you don't like your results then? Full training takes 273 epochs with default settings.

See https://github.com/ultralytics/yolov3/issues/214 for mAP thresholds.

glenn-jocher commented 5 years ago

@WannaSeaU do you have an update on this or can we close the issue?

yang-jin-hai commented 5 years ago

You can close the issue,thanks!

I’m trying to improve the confidence by training more epochs with responding modified lr_sheduler. However, although the train loss and confidence drop with epochs, the test loss doesn’t improve after about 40epochs. It’s getting overfitted.

The P, R, mAP, F1-score are quite good though, I’m trying to fine tune the hyps.

glenn-jocher commented 5 years ago

Well I would say as long as the mAP and F1 continue to improve you should continue training, because the mAP and F1 are computed on the test test, not the training set.

We wrote new code to tune the hyperparameters using a genetic evolution algorithm. You can access this from train.py by using the --evolve argparser argument. It will run your scenario 50 times, evolving the hyperparameters from their initial setting each generation, creating mutation from successful offspring, saving the results in evolve.txt.

glenn-jocher commented 5 years ago

@WannaSeaU also beware that before the LR scheduler was hardcoded to step down at epochs 218, 245, whereas now the inverse exp curve is tuned to the final epoch, so the curve shape stays the same and it bends back and forth depending on how many epochs you called for. This means that before if you trained to 40 your final lr as 0.001, too high, but now if you train to 40 your final lr will be produced by hyp['lrf'].

All in all though I don't think this LR scheduling makes a huge difference. BUT, in all of the examples I've seen you really need to train to almost 300 epochs to get the best results. 40 is much too early. Can you post your training results here? Use from utils import utils; utils.plot_results()

yang-jin-hai commented 5 years ago

I'm sorry that I forgot to point out that I have trained for the whole 273 epochs, even more, I train for 500 epochs for the purpose of improving confidence.

Besides, I noticed the recent changes of your codes, and the code on my use is cloned on 4.28, which includes the inverse exp curve LR schedule.

Also I noticed you have added codes to tune the hyperparameters using a genetic evolution algorithm, but I'm afraid I don't have enough computing power now. (XD)

In total the phenomenon I think is weired is:

The training is getting ovefitted actually, the results after 273 epochs are shown as follows: results_273

Some sample inference pics are shown below: dataset1_29_20160410_162125虹梅小区 29_9 dataset1_29_20160410_162125虹梅小区 29_7 dataset1_28_20160402_132138虹梅小区_8 dataset1_27_20160410_171117虹梅小区 27_6 dataset1_10_20160304_122026航天小区 10_13

It suddenly occurs to me that, maybe the problem is in the dataset?

glenn-jocher commented 5 years ago

Haha, interesting. It looks like its working really well! But yes, it seems 40-50 epochs is enough for your application.

You should check the class confidence coming out of the inference model, as it currently shows the obj conf times the class conf. Since you have one class you are really misusing the yolov3 structure, so I don't know what to expect (ideally you'd merge conf with cls_conf). Try replacing this line in detect.py, I'm assuming these should all be 1.0:

# Add bbox to the image
label = '%s %.2f' % (classes[int(cls)], conf)

with this:

# Add bbox to the image
label = '%s %.2f' % (classes[int(cls)], cls_conf)
yang-jin-hai commented 5 years ago

I have modified the related configs as you wrote in the tutorial for single class training.

It turn out that cls_conf is not 1.0:

With --conf-thres 0.001 --weights weights/latest.pt: Before replacing: dataset1_29_20160410_162125虹梅小区 29_7

After replacing: dataset1_29_20160410_162125虹梅小区 29_7

What's more, the inference performance of 'best.pt' is shown as follows: With --conf-thres 0.001 --weights weights/best.pt: dataset1_29_20160410_162125虹梅小区 29_7 (Uhmm……in a mess) With --conf-thres 0.01 --weights weights/best.pt: dataset1_29_20160410_162125虹梅小区 29_7 With --conf-thres 0.1 --weights weights/best.pt:

dataset1_29_20160410_162125虹梅小区 29_7

With --conf-thres 0.5 --weights weights/best.pt: dataset1_29_20160410_162125虹梅小区 29_7 It's default set in detect.py which suffers from low confidence in my application.

And this is a sample inference picture in CityPersons dataset, where the confidence can be nearly 1.0. This is completed on the earlier version of this repo. b30458cf0fc5616df170a61fe65a022

glenn-jocher commented 5 years ago

@WannaSeaU Ok hold on, you are bringing up a few different topics here. The first thing to remember is that you don't want to detect at 0.001 conf_thres, this is only for mAP computation, with detection typically set around 0.1 to 0.9 depending on your preference for Recall vs Precision.

The test seems to show that cls_conf is around 0.5 for all of your targets, and obj_conf would then be 1.0. I'm not sure why cls_conf would be 0.5 since there is only one class, but this means that all of the people are being detected with an object_confidence of about 0.99. The images confidences show the two multiplied togethor.

yang-jin-hai commented 5 years ago

@glenn-jocher I'll try to solve it myself, since I beilieve it's my own stuff. Thanks for your kindness.

yang-jin-hai commented 5 years ago

@glenn-jocher Hi bro, I got something new to tell you. I have figured out why the cls_conf is low. In #239, the mAP is high while the cls_conf is also low, I think many person who train on single class don't realize this issue. The activation function is sigmoid: https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/models.py#L165 For cls loss, the metric function is CrossEntropy(), which includes a log softmax function. https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/utils/utils.py#L280 Anyway, it's not even a problem for multi-class situation. But for one-class training, the cls loss calculated by CE will always be 0, which means bad classification score will not influence the total score. Therefore networks don't learn to give a high classification score when a person is detected. In fact, I think sigmoid may need negative samples for better performance, which is unsatisfied in single class scene. My solution for this is, use softmax instead of sigmoid for inference. Or else, simplely remove this line in nms. https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/utils/utils.py#L362 You may consider updating the tutorial for this problem. In addtion, I recalculated the anchors for custom dataset and got a performance boost. You can consider add this into tutorials too.

glenn-jocher commented 5 years ago

@WannaSeaU I think you are right!!! This might be a major bug you discovered. I'll look at this right now.

glenn-jocher commented 5 years ago

@WannaSeaU I confirmed you should be getting much better confidences by using our iDetection app to image this issue thread. I think you are absolutely right though that CELoss should use logits input (like BCEWithLogitsLoss)

glenn-jocher commented 5 years ago

@WannaSeaU ah no, everything is ok for training with CELoss, because the section you viewed only applies torch.sigmoid() when in inference mode, not in training mode. When in training mode the network outputs go straight to the loss function: https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/models.py#L129-L133

And in the loss function only the xy outputs are sigmoided: https://github.com/ultralytics/yolov3/blob/6316171f3351aa33235b0f9cda71da88dcfc5275/utils/utils.py#L272-L285

glenn-jocher commented 5 years ago

@WannaSeaU Ok, I understand what's going on now. If you just have a single class, CELoss() notices this, and outputs a zero loss. Since the loss is always zero, the cls output neurons never adjust one way or another, they simply stay the same as their random initialisations. And randomly initialized the model will output a mean around zero with a small standard deviation, which turns into 0.5 after passing through torch.sigmoid(). So what we need to do is add an if statement to catch single-class inference, and ignore the cls term. I've added this in lines 168-169 in commit 09ee7b6f115d96a377558492e05f7ef703d1c390.

https://github.com/ultralytics/yolov3/blob/dd2d713484c2b907604f906595fa7f72e4d8c82e/models.py#L160-L170

This makes logical sense as well, as the cls confidence is redundant in single class systems and should be ignored. All of the detection work is carried out instead by the object confidence output conf.

yang-jin-hai commented 5 years ago

@glenn-jocher Yes, that's exactly what I wanted to tell you. Also thank you for that you explained why the cls_conf is around 0.5, which I didn't realize.

By the way, I'm using the pre-trained weights yolov3-spp.pt. Maybe the neurons are not randomly initialized in this case? Why the output is also about 0.5?

glenn-jocher commented 5 years ago

@WannaSeaU I don't understand. You used yolov3-spp.pt to start training, but you had only 1 class? This checkpoint has 80 classes.

yang-jin-hai commented 5 years ago

@glenn-jocher Sorry I got it wrong. It's darknet53.conv.74. Is this randomly initialized?

glenn-jocher commented 5 years ago

@WannaSeaU the pytorch models are always randomly initialized, and then certain layers may be optionally replaced later, such as when loading checkpoints. darknet53.conv.74 is a pretrained backbone for the first 75 layers of YOLOv3. Layers above 75 stay randomly intialized.

H-YunHui commented 5 years ago

@WannaSeaU
I use VOC+COCO datasets to train people in this category, which can only reach 0.74(map). I would like to ask what datasets do you use to train?

yang-jin-hai commented 5 years ago

It was done on a custom dataset, where images are from monitor videos.

---Original--- From: "www12345678"notifications@github.com Date: Wed, Aug 14, 2019 17:02 PM To: "ultralytics/yolov3"yolov3@noreply.github.com; Cc: "Mention"mention@noreply.github.com;"WannaSeaU"1473628258@QQ.COM; Subject: Re: [ultralytics/yolov3] Low confidence and something about conf_thres (#235)

@WannaSeaU I use VOC+COCO datasets to train people in this category, which can only reach 0.74(map). I would like to ask what datasets do you use to train?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

glenn-jocher commented 5 years ago

@www12345678 the dataset has undergone significant change recently, current training result is within 1% of darknet, so if you don´t have the latest you may want to git pull and retrain. See https://github.com/ultralytics/yolov3/issues/310.

H-YunHui commented 5 years ago

@glenn-jocher I used the latest version to train, and this is my visualize result.(I'm training voc+coco datasets for one class: person) results2 There were 70,584 training images and 4,790 valid images, so I made some changes in yolov3.cfg and train.py. I set batch=64,subdivisions=8,max_batches = 50200,steps=40000,45000 in yolov3.cfg. I also set epochs=46,batch_size = 8 in train.py. No other parameters have changed. I have a request about visualization results, when epochs is around 37, the learning rate decreases by 10 times and MAP greatly improves, how can I adjust my learning rate to get the optimal result.

glenn-jocher commented 5 years ago

@www12345678 looks good! This is the LR scheduler: https://github.com/ultralytics/yolov3/blob/321bd957647ff68a676a6ac2e37d50ff96ece4ef/train.py#L155

Note that we do not read any training settings from the cfg file, we only read in the model architecture.

glenn-jocher commented 5 years ago

@www12345678 BTW, batch-size 8 is too small. You'll likely get better performance if you stick with the default settings, which are batch-size 32 accumulate 2, which give you an effective batch-size of 64.

If you run out of memory the equivalent is batch-size 16 accumulate 4, which also gives you effective batch-size 64.

H-YunHui commented 5 years ago

@glenn-jocher
Thank you very much. I will continue to try

ghost commented 4 years ago

@glenn-jocher @WannaSeaU I am going through something similar , training yolov2 (training from sratch) for a two class dataset (face and non face). though the training loss is going to 0.06, when i test the model of validation set , the results are not that good . doesnt detect face half the time. The problem is with the combined confidence score . the class score is always 1.0 but the confidence score is way too less (0.1- 0.2) for many instances with just one face in it. The loss is calculate by taking sigmoid of confidence and then MSE. For class loss cross entropy loss is taken . Note : the dataset i am using has only 4 instance of no face class. else 4996 instances have atleast one face in it. Do you guys think this might be the problem that i dont have enough non face instances thanks,

FranciscoReveriano commented 4 years ago

yolov2? To answer the second part that should be a significant problem.

ghost commented 4 years ago

@FranciscoReveriano thanks for the reply , yes i have build yolov2 from scratch and i am training it on a dataset for a face and non face object detection problem . just to clarify , by second part do u mean less non face instances in the dataset?

glenn-jocher commented 4 years ago

@agarwalyogeesh Hello, and thank you for your interest in our work! After reviewing your question we believe that this issue falls outside of the scope of this repository, which is limited to YOLOv3 PyTorch and ONNX model training, inference and deployment.

We suggest you raise the issue directly under the package or source causing the problem.

ghost commented 4 years ago

@glenn-jocher okay , thanks for ur time. i will ask on other platforms.

matthias-tschoepe commented 4 years ago

Hello @glenn-jocher thank you for your great work. I have dealt with a similar problem as shown in here in the meanwhile I have also found the problem. I'm not sure if the problem is still present in the latest version of this repo, but I'd still like to share my experience, maybe it helps someone. I downloaded your repo at the beginning of February 2020 and since then I have adapted it to my needs and therefore I have not updated to the latest version. Your idea, which you have explained here is absolutely correct, but the problem is, if we use the sigmoid function in non_maximumsuppression, we calculate sigmoid of 1, which is about 0.73106. Just print "pred[..., 5:]" after this line: `torch.sigmoid(pred[..., 5:]) Then we compute: pred[..., 5:] = pred[..., 4:5]` but due to the sigmoid, we have 0.73106 conf, and not 1 * conf (as expected). I fixed this, by computing the number of classes in non_maximum_supression with: num_classes = len(pred[-1]) - 5 and if num_classes == 1 then we copy the confidence values to the class score entries. Which means:

if len(pred[-1]) - 5 == 1:
    pred[..., 5] = pred[..., 4]
else:
    torch.sigmoid_(pred[..., 5:])
    pred[..., 5:] *= pred[..., 4:5]  # conf = obj_conf * cls_conf
a227799770055 commented 2 years ago

Hi, @glenn-jocher I meet the same problem of low confident at master branch? But I can't find where to fix the code, can someone help me? Thx!

glenn-jocher commented 2 years ago

@a227799770055 YOLOv3 is due for an update soon. In the meantime I would recommend YOLOv5 for all new projects: https://github.com/ultralytics/yolov5

GulerEnes commented 1 year ago

Same issue here and my solution: https://github.com/ultralytics/yolov3/issues/781#issuecomment-1274633020

glenn-jocher commented 10 months ago

@GulerEnes thank you for sharing your solution! We appreciate your contribution to the YOLO community. If you have any further questions or need assistance, feel free to ask.