jshilong / GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Other
496 stars 25 forks source link

VG Region Captioning Evaluation #28

Closed 594zyc closed 10 months ago

594zyc commented 11 months ago

Hi,

Thanks for open-sourcing this great work! We are developing some region captioning models and would like to perform a fair comparison with GPT4ROI. Is it possible to release the VG validation data you used for calculating the scores in Table 4? Thanks in advance!

jshilong commented 11 months ago

Thank you for your interest in our work We adopt the evaluation function of this codebase and it is very easy to use https://github.com/davidnvq/grit/blob/4664d69fc0b01a5459cfae73f1fd68045ce1a6ea/datasets/caption/metrics/__init__.py#L7C5-L7C19

https://github.com/davidnvq/grit/blob/4664d69fc0b01a5459cfae73f1fd68045ce1a6ea/engine/caption_engine.py#L207

594zyc commented 11 months ago

Thanks for your prompt response! Sorry for not making it clear, but besides the evaluation metrics, we also want to know which validation split of VG you used. Because there exists many different variants of the VG dataset (e.g. some filters out repeated phrases while some does not), it would be great if you can give a pointer to the annotation file and/or sample ids. Thanks!

jshilong commented 11 months ago

Thanks for your prompt response! Sorry for not making it clear, but besides the evaluation metrics, we also want to know which validation split of VG you used. Because there exists many different variants of the VG dataset (e.g. some filters out repeated phrases while some does not), it would be great if you can give a pointer to the annotation file and/or sample ids. Thanks!

test.json from GRiT: A Generative Region-to-text Transformer for Object Understanding

594zyc commented 11 months ago

Got it. Thank you so much!

594zyc commented 10 months ago

Another question about V7W evaluation: I noticed that you mentioned "To prevent information leakage, we remove overlapping images with the test set from Visual Genome (Krishna et al., 2017)." But as mentioned in the V7W paper, the images they used are from COCO, which I believe are the intersections between COCO and VG. Do you also remove those images from the COCO object detection dataset from your pretraining?

jshilong commented 10 months ago

Another question about V7W evaluation: I noticed that you mentioned "To prevent information leakage, we remove overlapping images with the test set from Visual Genome (Krishna et al., 2017)." But as mentioned in the V7W paper, the images they used are from COCO, which I believe are the intersections between COCO and VG. Do you also remove those images from the COCO object detection dataset from your pretraining?

In our last experiment, we actually removed coco detection dataset in order to speed up training. This data set seems to have not received much attention before, so that recent study is not very strict about its use(And you will find the early method on V7W use COCO pretrained faster rcnn). For example, shikra/other report does not seem to remove overlap images from VG. Our experience is that the VG data has a greater impact on v7w, and other data sets work limited.

594zyc commented 10 months ago

Got it! Thank you so much for the clarification!