shikras / d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
https://arxiv.org/abs/2307.12813
Other
104 stars 7 forks source link

about pre-trained #3

Closed JunL-Geek closed 6 months ago

JunL-Geek commented 10 months ago

I change Grounding_DINO by myself to evaluate on d3, but outputs are almost zero. Whether you pre-train Grounding_DINO in d3?

Charles-Xie commented 9 months ago

Hi, Thanks for your interest in our work! We appreciate that very much!

We use the pretrained grounding-dino checkpoint released in their official repo (https://github.com/IDEA-Research/GroundingDINO) directly and perform no further pretraining or fine-tuning by ourselves, and obtain a performance of more than 20.0 intra-scenario mAP (as reported in our paper). Based on the limited information in the question, currently I can only guess that this may be caused by some bugs in your evaluation script. We will release the evaluation script with grounding-dino later, but if you are in a hurry for this evaluation, maybe you can provide more information about your implementation of evaluation and we can check if there are some bugs or misunderstandings.

Charles-Xie commented 9 months ago

Note that currently we provide a evaluation script on $D^3$ with OWL-ViT: https://github.com/shikras/d-cube/tree/main/eval_sota. May you can take this implementation as a reference.

JunL-Geek commented 9 months ago

Hi, Thanks for your reply!I get different results when I follow your code and grounding-dino's official implement to evaluate grounding-dino. Grounding-dino need proper box-threshold,I get the best mAP when I choose 0.2. But in your paper, in term of One-instance, Grounding-dino can reach 63.7% mAP when I get around 29% mAP. All these outputs are based on tiny pre-train model. In fact, I can get similar results in intra-scenario mAP

------------------ 原始邮件 ------------------ 发件人: "Chi @.>; 发送时间: 2023年10月2日(星期一) 中午12:06 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [shikras/d-cube] about pre-trained (Issue #3)

Hi, Thanks for your interest in our work! We appreciate that very much!

We use the pretrained grounding-dino checkpoint released in their official repo (https://github.com/IDEA-Research/GroundingDINO) directly and perform no further pretraining or fine-tuning by ourselves, and obtain a performance of more than 20.0 intra-scenario mAP (as reported in our paper). Based on the limited information in the question, currently I can only guess that this may be caused by some bugs in your evaluation script. We will release the evaluation script with grounding-dino later, but if you are in a hurry for this evaluation, maybe you can provide more information about your implementation of evaluation and we can check if there are some bugs or misunderstandings.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Charles-Xie commented 9 months ago

It's nice that you can reproduce the intra-scenario mAP in our paper. From what I remember currently, the grounding-dino performance is heavily affected by a box_threshold and a text_threshold. If you did use our code https://github.com/shikras/d-cube/blob/main/scripts/eval_and_analysis_json.py#L182 to evaluate the one-instance mAP, then I think maybe the specific hyper-parameters are the reasons that you cannot obtain a performance as high as in our paper for grounding-dino. I will upload the evaluation code for grounding-dino in a few weeks, possibly together with an updated version of the paper. We are very very grateful for your interest and If you have a method with potential for DOD, we would really appreciate it if you can evaluate your method on our dataset!

Charles-Xie commented 6 months ago

This issue is not active for some time so I'm closing it for now. Feel free to reopen it or open another one if any questions arises.