Open bxwldljh opened 1 month ago
Unfortunately, I no longer have access to my previous lab servers after graduation. Nevertheless, reproducing the requested files should be very straightforward. Specifically,
(1) For grounding_annotation, it essentially stores the width/heights/Image ID of the image associated with a question, together with IDs of different objects mentioned in the explanation (under the name "roi"). For the later part, you can obtain it by querying the "processed_explanation_train.json", which has not converted the object id into ROI numbers (e.g., an example explanation is "(obj:4013264) in front of (obj:4013301) is fence" and the roi should be ['4013264', '4013301']).
(2) In terms of grounding_upper_bound, it stores the maximal IoU score between the bottom-up features and the ground truth bbox (for objects mentioned in the explanation) in the scene graph. This can be obtained by first finding the bbox in the bottom-up features with the largest IoU for each ground truth object bbox, then generating the binary maps for the "optimal" bottom-up bboxes and the ground truth object bboxes, and finally measuring the IoU between the two maps.
you are very nice! many thanks to you!
hi, can you provide the
grounding_upper_bound.json
andgrounding_annotation.json
for the training set?