Closed 594zyc closed 10 months ago
Thank you for your interest in our work We adopt the evaluation function of this codebase and it is very easy to use https://github.com/davidnvq/grit/blob/4664d69fc0b01a5459cfae73f1fd68045ce1a6ea/datasets/caption/metrics/__init__.py#L7C5-L7C19
Thanks for your prompt response! Sorry for not making it clear, but besides the evaluation metrics, we also want to know which validation split of VG you used. Because there exists many different variants of the VG dataset (e.g. some filters out repeated phrases while some does not), it would be great if you can give a pointer to the annotation file and/or sample ids. Thanks!
Thanks for your prompt response! Sorry for not making it clear, but besides the evaluation metrics, we also want to know which validation split of VG you used. Because there exists many different variants of the VG dataset (e.g. some filters out repeated phrases while some does not), it would be great if you can give a pointer to the annotation file and/or sample ids. Thanks!
test.json from GRiT: A Generative Region-to-text Transformer for Object Understanding
Got it. Thank you so much!
Another question about V7W evaluation: I noticed that you mentioned "To prevent information leakage, we remove overlapping images with the test set from Visual Genome (Krishna et al., 2017)." But as mentioned in the V7W paper, the images they used are from COCO, which I believe are the intersections between COCO and VG. Do you also remove those images from the COCO object detection dataset from your pretraining?
Another question about V7W evaluation: I noticed that you mentioned "To prevent information leakage, we remove overlapping images with the test set from Visual Genome (Krishna et al., 2017)." But as mentioned in the V7W paper, the images they used are from COCO, which I believe are the intersections between COCO and VG. Do you also remove those images from the COCO object detection dataset from your pretraining?
In our last experiment, we actually removed coco detection dataset in order to speed up training. This data set seems to have not received much attention before, so that recent study is not very strict about its use(And you will find the early method on V7W use COCO pretrained faster rcnn). For example, shikra/other report does not seem to remove overlap images from VG. Our experience is that the VG data has a greater impact on v7w, and other data sets work limited.
Got it! Thank you so much for the clarification!
Hi,
Thanks for open-sourcing this great work! We are developing some region captioning models and would like to perform a fair comparison with GPT4ROI. Is it possible to release the VG validation data you used for calculating the scores in Table 4? Thanks in advance!