ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
51.11k stars 16.42k forks source link

I found Objectness in ComputeLoss which might need improvement #3605

Closed XHBrain closed 3 years ago

XHBrain commented 3 years ago

Hi, sorry for disturbing you. I found one part in your code which might be improvement. There maybe several GTs match the same anchor when calculate ComputeLoss in the scene with dense targets.

# Objectness
tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype) # iou ratio

The tobj is filled with the location of [b, a, gj, gi] which will be covered with the next same location. The final tobj is the bottom right GT in build_targets. It may be not reasonable. The improvement of matching the GT with largest IOU could be used with the following modification.

source: tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype) # iou ratio should be replaced with:

score_iou = iou.detach().clamp(0).type(tobj.dtype)
sort_id = torch.argsort(score_iou)
b, a, gj, gi, score_iou = b[sort_id], a[sort_id], gj[sort_id], gi[sort_id], score_iou[sort_id]
tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * score_iou  # iou ratio

I just tried in the private dataset with the scene with very dense objects, the performance will be improved with above change.

github-actions[bot] commented 3 years ago

πŸ‘‹ Hello @XHBrain, thank you for your interest in πŸš€ YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

@XHBrain hi, thanks for the loss function idea! The best way to test your change would be to train two models on a public dataset like VOC from scratch, one with your proposed change, one with the default code:

python train.py --batch 64 --weights '' --cfg yolov5s.yaml --data voc.yaml --epochs 300 --cache --img 512 --hyp hyp.finetune.yaml --project VOC

If you could do this and report back both results that would be great!

glenn-jocher commented 3 years ago

@XHBrain also can you submit a PR please with your changes? Then we can checkout the branch and test the change ourselves, and if it works then we can merge into master for everyone to use. Thanks!

XHBrain commented 3 years ago

@XHBrain also can you submit a PR please with your changes? Then we can checkout the branch and test the change ourselves, and if it works then we can merge into master for everyone to use. Thanks!

Hi @glenn-jocher, Thanks for your response. I commited the change for code review, please check it. https://github.com/XHBrain/yolov5/tree/max_iou_objectLoss You could choose dataset with dense objects for this change verfication.

glenn-jocher commented 3 years ago

@XHBrain thanks! I've created PR #3610 from your branch, I'll check it out and experiment on it a bit.

glenn-jocher commented 3 years ago

@XHBrain I have an IoU-sort experiment ongoing at https://wandb.ai/glenn-jocher/study-mixup-coco

Was underperforming around epoch 50 but around epoch 80 seems to be slightly overperforming the baseline. This is running on EC2 spot instance though which unfortunately forces restarts and resumes too often, so I don't know if the results are reliable.

XHBrain commented 3 years ago

Hi @glenn-jocher, so many thanks for your verification. From your training result, I see that my change almost has no benifit for that dataset. It was curious that the change worked on my dense dataset. Anyway, I will do more experiments and analysis on the change. Thanks again.

glenn-jocher commented 3 years ago

@XHBrain unfortunately the training has been restarted so many times due to spot restarts that the training results are no longer reliable. I'm going to need to secure some persistent hardware to rerun the tests. This is a bit of a problem for us at the moment...

791136190 commented 3 years ago

I also found this problem, and at the same time I think we should add the de-duplication operation of box regression.

glenn-jocher commented 3 years ago

@791136190 yes, right now one anchor fits all matching targets for cls and box, but objectness losses are only applied once per anchor. This is a bit odd on the objectness loss, but seems to work well currently.

EDIT: If you have any other ideas please let us know, or even better try them out so we have some quantitative before and after numbers we can compare.

isJunCheng commented 1 year ago

I used the sorting function in a custom dataset and the result has a decrease instead. The anchor with multiple sample matches should have higher iou, not be covered by low iou. The theory is correct, but the experimental result is opposite and I am puzzled. Can someone help me with this please?

glenn-jocher commented 1 year ago

@isJunCheng this could depend on the specifics of your dataset, the model architecture, and training hyperparameters. Your observations are insightful and important. It seems like an issue that warrants further investigation and perhaps a deeper understanding of the trade-offs involved. We greatly appreciate your input.