seanzhuh / SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding
https://arxiv.org/abs/2203.16265
128 stars 14 forks source link

avoid ground truth bbox change #31

Closed chenwei746 closed 11 months ago

chenwei746 commented 1 year ago

hi, I have found that the ground truth of the bboxes change over the process of training which will greatly affect the performance of the model. Because results['ann']['bbox'] is an array, assignment to gt_box is a shallow copy, and then when the value in gt_box[2],gt_box[3] is changed, the value in results['ann']['bbox'] is changed as well, so that each time the value is read, results['ann'][' bbox']] in results['ann']['bbox'][2] and results['ann']['bbox'][3] will be added once to the values in results['ann']['bbox'][0] and results['ann']['bbox'][1], resulting in a change in the value in results[' ann']['bbox'][2] and results['ann']['bbox'][3] keep getting bigger and bigger.

gt_bbox = results['ann']['bbox']
gt_bbox[2] = gt_bbox[0] + gt_bbox[2]
gt_bbox[3] = gt_bbox[1] + gt_bbox[3]
gt_bbox = numpy.array(gt_bbox, dtype=numpy.float64)  # x1, y1, x2, y2
h, w = results['ori_shape'][:2]
gt_bbox[0::2] = numpy.clip(gt_bbox[0::2], 0, w-1)
gt_bbox[1::2] = numpy.clip(gt_bbox[1::2], 0, h-1)

Originally, this would have made the bottom-right coordinates go beyond the image, but after using numpy.clip all the coordinates of the bottom-right corner of the ground truth become the bottom-right coordinates of the image. In this case, only the coordinates of the upper left corner are correct and the lower right corner is wrong. To resolve this, just change this code

gt_bbox = results['ann']['bbox']

to

gt_bbox = copy.deepcopy(results['ann']['bbox'])

If you think it's too long above,you can run these two test and compare the difference then you will understand this problem:

# test1
import numpy

results = {'ann': {'bbox': [150, 150, 200, 400]}, 'ori_shape': [640, 480]}

print("annotatoin:", results['ann']['bbox'])

for _ in range(3):
    gt_bbox = results['ann']['bbox']
    gt_bbox[2] = gt_bbox[0] + gt_bbox[2]
    gt_bbox[3] = gt_bbox[1] + gt_bbox[3]
    gt_bbox = numpy.array(gt_bbox, dtype=numpy.float64)  # x1, y1, x2, y2
    h, w = results['ori_shape'][:2]
    gt_bbox[0::2] = numpy.clip(gt_bbox[0::2], 0, w - 1)
    gt_bbox[1::2] = numpy.clip(gt_bbox[1::2], 0, h - 1)
    print("annotatoin:", results['ann']['bbox'], "gt_box:", gt_bbox)
"""
output is as follows:
annotatoin: [150, 150, 200, 400]
annotatoin: [150, 150, 350, 550] gt_box: [150. 150. 350. 550.]
annotatoin: [150, 150, 500, 700] gt_box: [150. 150. 479. 639.]
annotatoin: [150, 150, 650, 850] gt_box: [150. 150. 479. 639.]
"""
import numpy
import copy

results = {'ann': {'bbox': [150, 150, 200, 400]}, 'ori_shape': [640, 480]}

print("annotatoin:", results['ann']['bbox'])

for _ in range(3):
    gt_bbox = copy.deepcopy(results['ann']['bbox'])
    gt_bbox[2] = gt_bbox[0] + gt_bbox[2]
    gt_bbox[3] = gt_bbox[1] + gt_bbox[3]
    gt_bbox = numpy.array(gt_bbox, dtype=numpy.float64)  # x1, y1, x2, y2
    h, w = results['ori_shape'][:2]
    gt_bbox[0::2] = numpy.clip(gt_bbox[0::2], 0, w - 1)
    gt_bbox[1::2] = numpy.clip(gt_bbox[1::2], 0, h - 1)
    print("annotatoin:", results['ann']['bbox'], "gt_box:", gt_bbox)
"""
output is as follows:
annotatoin: [150, 150, 200, 400]
annotatoin: [150, 150, 200, 400] gt_box: [150. 150. 350. 550.]
annotatoin: [150, 150, 200, 400] gt_box: [150. 150. 350. 550.]
annotatoin: [150, 150, 200, 400] gt_box: [150. 150. 350. 550.]
"""