Closed WindVChen closed 3 years ago
I think you should use default anchor_scale, but also modify the code to use higher resolution
@zylo117 Thanks for your reply. I did have tried the default anchor_scale, which only got less than 20mAP, so I think it's not suitble for my object size. About "use higher resolution", do you mean changing the input of D0 from "512" to higher resolution for example? As my image size is already 512×512, I think the default resolution setting is reasonable.
Is there any other suggestion? Or can you give me some possible reasons why the EfficientDet is weaker than the Retinanet on small objects? Thanks a lot.
@zylo117 I try training with pretrained-weights again, and the result seems back to track, about 72mAP for D0. It seems the pretrained-weight is important for training custom datasets, though my data is totally different with nature images.
so you was training without pretrained weights? You shouldn't have done that, not unless you have a better and larger dataset like coco.
@zylo117 Yes, I will pay attention next time. Another question is that the detection heads in the paper are weight-shared, but that in your code are independent of each other. Will these two methods have a big difference in the final result?
mine is also weight-shared. By the shared weights, they mean the shared BN layer for every FPN outputs and shared CONV layer for every regression layer. https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/blob/15403b5371a64defb2a7c74e162c6e880a7f462c/efficientdet/model.py#L363-L364
I think the performance should be a little bit better if not using shared weights but the weights will be larger.
Sorry, I may read it wrong. I will close this issue, thank you again for your reply.
The work is very amazing! But I meet some problems on detecting small objects.
I have read almost all of the issues about detecting small objects, and they cannot deal with my doubts.
My object's size is 15×15~20×20, smaller than COCOTiny, therefore I change the 'anchor_scales' into '[2 -4, 2 -2, 2 * -1]'. As the anchor size calculate is "scale pyramid_level * base_scale", the anchor size is between 2 to 256, so I think it's enough to meet my object size. Considering the default ratio setting may effect the match of gtbox and anchorbox, I change the IoU setting to 0.2.
I have tried D0/D2/D4, each I train 1000 epoches (Yes, 1000). For each 10 epoches, I valuate my valuation dataset, but the best mAP is low, 43/56/55 respectively (D4 is worse than D2). The best result appears about 150 epoches.
I think the net is converged, since the training loss can fall down to about 0.3 in D0/D2, and about 0.005 in D4.
I also tried Retinanet, and get 73 mAP. As EfficientDet and Retinanet both use Focal Loss and FPN/BiFPN, I think EfficientDet shoud not weak than Retinanet. I'm very confused about that.
Appreciating any help and advice!