yuhangzang / OV-DETR

[Under preparation] Code repo for "Open-Vocabulary DETR with Conditional Matching" (ECCV 2022)
202 stars 20 forks source link

Unseen classes in COCO #18

Open Dopamine0717 opened 1 year ago

Dopamine0717 commented 1 year ago

Hi, I have found that the ground truth of unseen classes in COCO have been used in trainning, I wonder if I'm misunderstood.

ids of unseen classes in COCO 微信截图_20230327113302

output of model, as you can see, 'selected_id' contain '36' which is one of the unseen classes in COCO 微信截图_20230327113415

while bipartite matching, the gt of label 36 has been used in computing the cost 微信截图_20230327113837

xiaohuihui52309 commented 1 year ago

+1

Hzzone commented 11 months ago

+1 I have not found whether the unseen classes have been excluded during training. I have downloaded the annotation file instances_train2017_seen_2_proposal.json, and found that the unseen classes exist in annotations. Please refer to https://github.com/alirezazareian/ovr-cnn/blob/master/ipynb/003.ipynb that has indeed excluded the unseen classes.

huzhangcs commented 11 months ago

I also found the same problem but no explanation for now

yechenzhi commented 11 months ago

https://github.com/yuhangzang/OV-DETR/issues/5 you can check this issue to see how the author got instances_train2017_seen_2_proposal.json file. I also visualized some images by instances_train2017_seen_2_proposal and coco_train_anno_all seperately, for example when image_id = 176179, in instances_train2017_all_2: image in instances_train2017_seen_2_proposal: image

Solacex commented 11 months ago

This question is simple, the issue raiser compares the label indexes before and after the label mapping, the mentioned list self.cat_ids_unseen stores the unseen label ids before label mapping, while the output class 36 is the id after mapping.

huzhangcs commented 11 months ago

Then what about the self.all_ids here https://github.com/yuhangzang/OV-DETR/blob/main/ovdetr/models/model.py#L290 self.all_ids is in range(0,64), which is exactly the same as seen_classes+unseen_class after mapping. No differences are observed when they process seen and unseen classes. They just randomly select the classes from self.all_ids, and randomly select embeddings from the pkl file based on the selected classes. The labels of unseen classes are leaked in this process as i can see.

yechenzhi commented 11 months ago

Then what about the self.all_ids here https://github.com/yuhangzang/OV-DETR/blob/main/ovdetr/models/model.py#L290 self.all_ids is in range(0,64), which is exactly the same as seen_classes+unseen_class after mapping. No differences are observed when they process seen and unseen classes. They just randomly select the classes from self.all_ids, and randomly select embeddings from the pkl file based on the selected classes. The labels of unseen classes are leaked in this process as i can see.

no, the annotations used in OV-DETR for unseen classes are not real annotations, but generated by a class agnostic detector and then classified by CLIP.

huzhangcs commented 11 months ago

I understand what you mean, but please check how they use their clip text embedding in https://github.com/yuhangzang/OV-DETR/blob/main/ovdetr/models/model.py#L294, self.zeroshot_w includes the groundtruth text embeddings for unseen classes. If randomly select id from self.all_ids, someone could use the groundtruth text embeddings directly in the training. Do you think it is still the open-voc problem?

yechenzhi commented 11 months ago

I understand what you mean, but please check how they use their clip text embedding in https://github.com/yuhangzang/OV-DETR/blob/main/ovdetr/models/model.py#L294, self.zeroshot_w includes the groundtruth text embeddings for unseen classes. If randomly select id from self.all_ids, someone could use the groundtruth text embeddings directly in the training. Do you think it is still the open-voc problem?

you are right, the training process of OV-DETR includes unseen classes, and it differs from previous Open-Voc settings, which may be an unfair setting.

huzhangcs commented 11 months ago

According to the paper, this should not be the case. But based on the code released, they just use the groundtruth text label for unseen classes directly.

yechenzhi commented 11 months ago

According to the paper, this should not be the case. But based on the code released, they just use the groundtruth text label for unseen classes directly.

you can check table 1 in https://arxiv.org/pdf/2303.13076v1.pdf , 'Require Novel Class', I guess there are different settings in open-voc problem.

huzhangcs commented 11 months ago

No, still not correct. If you investigate similar work carefully, for example, ViLD which also requires novel classes. The setting should be that you can pose some constraints to potential objects, which potentially include novel objects. However, in the released code of OV-DETR, they directly constrain the novel objects directly. These two cases are completely different.