Closed ggaziv closed 4 years ago
I scanned though his code, and found the threshold is 0.4 by default, which is higher than recommended 0.05. So the mAP is lower of course. But I saw there are lot os zeros. It makes no sense, I think your implement slight changes something.
Here's a suggestion, you can debug the evaluation on the same image with different scripts, and see if their results are consistent
From what I see, the default value you are referring to is not wired to anything, and the values are as you mention (here). Also, I did substantial changes to fix it with your code.
It still raises something important to understand about your code: you set the number of classes/FC output to be len(obj_list)
(which is 90 labels) however the CocoDataset does coco_label->label mapping , i.e., label values for model are 0-79. It means that you have additional dummy units there 80-89.
On the other hand, if this was the case then finetuning your coco checkpoint as I did would have not changed anything. But as posted above - it did ! So it seems as if the published checkpoint does not fit the current training code - else why the drastic change when finetuning with the same dataset?
Supporting this, check out these validation plots when finetuning the coco checkpoint (on CocoDataset). They show a brief improvement in classification (and not regression). I think it might exactly correspond to this correction of labeling range.
I'm confused here.
Finetuning is a way to adjust the weights to new labeling of a dataset. Clearly, if I finetune on corrupted labeling it will be reflect in evaluation. In the same way, if I fix the labeling and finetune I can fix poor evaluation.
Here I performed the naive experiment of finetuning on the same coco dataset (without change of labeling) and expected to find no impact in evaluation - but this wasn't the case (see above). Why doesn't the training code fit the pretrained weights? Wasn't it used to create them?
Also, I found your new fix for coco category id mismatch - could this be related?
You are right on point (4) but I do not consider his model, but yours (not using his code as is):
model = EfficientDetBackbone(compound_coef=compound_coef, num_classes=len(obj_list),
ratios=eval(params['anchors_ratios']), scales=eval(params['anchors_scales']))
yes, coco class id mismatching is fixed. but it won't affect anything but training on coco
Great - so now I can expect that the training code fits the pretrained weights? Namely, trying to finetune on coco from the pretrained weights should not significantly alter evaluation correct?
yes, if the lr is low enough.
Was only fixed when set annotation[0, 4] = a['category_id'] - 1
in
https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/blob/3716ff4a133ea36359ef7ec088fd0968335fd9a7/efficientdet/dataset.py#L76
fixed, thanks
I implemented a VOC-style mAP evaluation (following toandaominh1997/EfficientDet.Pytorch).
Then, it appears like the empty class names in
coco.yml
mix-up things and corrupted AP for many classes, using the published checkpoint:I fix this easily by finetuning from that checkpoint just a little bit, and evaluating again:
I think this suggests that the published checkpoint is one that does not predict labels in range 0-79 (which is the case for
CocoDataset
), but at the wider range according the length ofobj_list
incoco.yml
.Could this be clarified?