zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.2k stars 1.27k forks source link

a question about loss function #75

Open tingnit opened 4 years ago

tingnit commented 4 years ago

The code just find the best gt_box over 0.5IoU assign to each anchor, if there have gt_boxes can't find any anchor matched over 0.5IoU, those gt_boxes may not be trained. Briefly, does Bipartite Matching nessesary?

zylo117 commented 4 years ago

you are right. if anchors shape is complete different from the gt's shape, then most of the gt boxes will be filtered out, so that the loss will be fakingly low

tingnit commented 4 years ago

you are right. if anchors shape is complete different from the gt's shape, then most of the gt boxes will be filtered out, so that the loss will be fakingly low

yeah, so do you have plan adding Bipartite Matching to loss fuction? It would be help for increasing Recall but may cause training stage unstable sometimes i guess.

rvandeghen commented 4 years ago

I had this problem where my gt boxes where very small even though I had already the smallest anchors. Solved this by reducing the IoU threshold.

glenn-jocher commented 4 years ago

@zylo117 about the anchors, how many anchors are there at each level P3-P7, and how are their shapes defined? Thank you!

zylo117 commented 4 years ago

@zylo117 about the anchors, how many anchors are there at each level P3-P7, and how are their shapes defined? Thank you!

anchor config is here. projects/*.yml

glenn-jocher commented 4 years ago

@zylo117 ah of course, thank you! Since there are 3 scales and 3 ratios, does this mean the model has 3x3=9 anchors per layer, or 3x3 x 5 = 45 anchors total?

zylo117 commented 4 years ago

@zylo117 ah of course, thank you! Since there are 3 scales and 3 ratios, does this mean the model has 3x3=9 anchors per layer, or 3x3 x 5 = 45 anchors total?

Yes. there are 3 anchor_scales x 3 anchor_ratios x 5 pyramid_levels = 45 types of anchors

glenn-jocher commented 4 years ago

@zylo117 wow this is a significant number of anchors, but I suppose the high iou threshold cuts down on the number of detections. For reference in ultralytics/yolov3 we have 9 anchors and an iou threshold of about 0.2.

The yaml files are very nice and clean, I might try and adopt this standard for our repo. I'm assuming the units of the anchor scales is strides correct? i.e. at P3 this would be a stride of 8 pixels, so a smallest anchor box of 8x8 pixels at P3, and then 16x16 at P4 etc?

zylo117 commented 4 years ago

@zylo117 wow this is a significant number of anchors, but I suppose the high iou threshold cuts down on the number of detections. For reference in ultralytics/yolov3 we have 9 anchors and an iou threshold of about 0.2.

The yaml files are very nice and clean, I might try and adopt this standard for our repo. I'm assuming the units of the anchor scales is strides correct? i.e. at P3 this would be a stride of 8 pixels, so a smallest anchor box of 8x8 pixels at P3, and then 16x16 at P4 etc?

Yes. EffDet has more anchors, so it would be slower than yolo v3 at anchor transform.

And I was wondering how YOLOv3-SPP-ultralytics gets 6 more mAP than YOLOv3-SPP. They have the same network architecture right?

How fast exactly YOLOv3-SPP-ultralytics can get at batch_size 1 including post-processing? It says it's 19.3ms at batch_size 16, so it 50FPS?

In that case, I think YOLOv3-SPP-ultralytics is a better replacement of EffDet D2 and D3, without regard to the memory usage and storage space.

glenn-jocher commented 4 years ago

@zylo117 yes yolov3-spp-ultralytics uses the same exact backbone and head/anchors as the original, the only advances are in the training and loss function. It's a very interesting result because it shows that while architecture is important, the training method itself can improve the mAP by 20% or more (37 in yolov3 paper to 43 in ultralytics/yolov3).

When I first started training from scratch results were very poor, about 30 mAP. The main advances are GIoU, EMA, Mosaic dataloader, and Merge NMS to reach 43mAP single-scale / 45.6 mAP multi-scale.

The mAP is substantially less than efficientdet's D4-D7, but the speed is very fast. At batch-size 1 the V100 speed is around 14ms/608x608 img @43mAP. At the most extreme speed we can do about 3ms/256x256 img @33mAP at batch-size 128, or about 300+ FPS (inference and NMS included).

DecentMakeover commented 4 years ago

@rvandeghen did you reduce iouthreshold during training or testing? Thanks

rvandeghen commented 4 years ago

@DecentMakeover during training because some of my anchors could not fit any GT boxes (small object) even if my anchors where already the smallest possible. The only way I could detect those objects was to reduce the IoU threshold during training. If you have the same distribution for training and test you can also reduce it for testing or some objects might not get caught which will drop your recall value.

DecentMakeover commented 4 years ago

Thanks for the reply,

i was wondering where did you make the change specifically, in loss.py on line 69 i found this

        IoU = calc_iou(anchor[:, :], bbox_annotation[:, :4])
        IoU_max, IoU_argmax = torch.max(IoU, dim=1)

there is no specific iou_threshold variable or am i missing something?

Thanks

rvandeghen commented 4 years ago

@DecentMakeover it is few lines under

targets[torch.lt(IoU_max, 0.4), :] = 0 positive_indices = torch.ge(IoU_max, 0.5)

The first line means that targets with iou lower than 0.4 are considered as background while second line records indices where iou is greater than 0.5 and thus correspond to objects. You need to lower these 2 values.

You can determine the value for the positive indices empirically or try with a value of your choice.

I did a lot of debugging with anchors and I recommend you to do so as the key for good performances is to have anchors that fit your boxes.

Hope this will help you.

DecentMakeover commented 4 years ago

thanks for the input,

yeah, even i was looking for a good way to change the anchor scales and ratios.