question about the open vocabulary performance of Grounding DINO on the COCO dataset

ymzis69 commented 3 weeks ago

Thank you for your amazing work. I currently have a question regarding the open vocabulary performance of Grounding DINO on the COCO dataset and hope to get your response.

At present, I am researching open vocabulary performance. In your paper, other models achieve a maximum AP_all of only 61.0 on the COCO dataset when the IoU threshold is 0.5. However, MM Grounding DINO (reproduction of Grounding DINO under the MMDetection framework, pretrained on O365, GoldG, GRIT, and V3Det datasets, available at https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_dino/README.md) achieves 73.6 for this metric on the COCO dataset. Could you please confirm if these are the same test metrics? If they are the same, does this mean that the open vocabulary performance of MM Grounding DINO significantly surpasses other methods? I look forward to your reply. 1718101457449 1718101480131

lxtGH commented 3 weeks ago

@ymzis69 Hello, as one co-author of MM-Grounding DINO, this is because MM-Grounding DINO use more datasets than original Grounding DINO. Please cite the original number when performing comparison.

ymzis69 commented 3 weeks ago

Thank you for your response, but Grounding DINO did not train on the base classes of the COCO dataset and then test the results on both base and novel classes, which is why I referenced the results from MM-Grounding DINO. To be more precise, I am not investigating the model's zero-shot performance on the COCO dataset, but rather the model's performance on novel classes after being trained on the base classes of the COCO dataset.

lxtGH commented 3 weeks ago

For me, I think it is hard to directly compare Grounding DINO results with Tab-9 in our work since most objects are seen during the pre-training.

ymzis69 commented 3 weeks ago

Yes, it is not possible to judge the model's performance based on results with unequal experimental variables. Thank you for your response.

jianzongwu / Awesome-Open-Vocabulary

question about the open vocabulary performance of Grounding DINO on the COCO dataset #27