Some questions about code implementation

meituan / YOLOv6

YOLOv6: a single-stage object detection framework dedicated to industrial applications.

GNU General Public License v3.0

5.71k stars 1.03k forks source link

Before Asking

[X] I have read the README carefully. 我已经仔细阅读了README上的操作指引。
[ ] I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集，我已经仔细阅读了训练自定义数据的教程，以及按照正确的目录结构存放数据集。（FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。）
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码，重新运行之后，问题仍不能解决。

Search before asking

[X] I have searched the YOLOv6 issues and found no similar questions.

Question

In yolov6/assigners/tal_assigner.py, the align_metric already includes overlaps, https://github.com/meituan/YOLOv6/blob/4364f29bf3244f2e73d0c42a103cd7a9cbb16ca9/yolov6/assigners/tal_assigner.py#L131 but multiply the overlaps again in here https://github.com/meituan/YOLOv6/blob/4364f29bf3244f2e73d0c42a103cd7a9cbb16ca9/yolov6/assigners/tal_assigner.py#L80 why?
Is the VarifocalLoss the same as varifocal_loss mentioned in VarifocalNet? If so, i believe the pred_score should be replaced with (pred_score - gt_score).abs(), or this change would boost the performance?https://github.com/meituan/YOLOv6/blob/4364f29bf3244f2e73d0c42a103cd7a9cbb16ca9/yolov6/models/losses/loss.py#L200

Additional

No response

Thanks for using! 1.align_metric is a matrix where each element represents the alignment metric between a predicted bounding box and a ground truth bounding box. pos_overlaps is a vector where each element represents the maximum IoU score between a ground truth bounding box and all predicted bounding boxes. pos_align_metrics is a vector where each element represents the maximum alignment metric between a ground truth bounding box and all predicted bounding boxes. We would like to combine the alignment metric and the IoU score to more accurately assess how well the predicted bounding boxes match the ground truth. To do this, we multiply align_metric by pos_overlaps so that the alignment metric of each predicted bounding box is multiplied by the maximum IoU score between it and the ground-truth bounding box. The purpose of this is to penalize those predicted bounding boxes that overlap less with the ground-truth bounding boxes, as they may be false detections or missed detections. We then divide the result by pos_align_metrics plus a small constant self.eps to avoid division by zero. The purpose of this is to normalize the alignment metric of each predicted bounding box to the [0,1] range. The advantage of this is that we can compare the alignment metric between different ground-truth bounding boxes without being influenced by the maximum alignment metric between them. Another benefit of this is that we can combine the alignment metric and the IoU score to more accurately assess how well the predicted bounding boxes match the ground truth. 2.The purpose of these two ways of calculating weights is to distinguish the relationship between positive and negative samples. The method we use is the method of calculating weights proposed in the original paper of Varifocal Loss. The meaning of this formula is that for negative samples (label=0), we increase the weight of their predicted scores, so that the model pays more attention to negative samples that are difficult to classify; for positive samples (label=1), we weight their true scores The weight of is increased, so that the model pays more attention to important positive samples. Regarding another weighting method, we have not compared it, and we will compare and verify different weighting methods when we have time.

meituan / YOLOv6