hi, I have found that the ground truth of the bboxes change over the process of training which will greatly affect the performance of the model. Because results['ann']['bbox'] is an array, assignment to gt_box is a shallow copy, and then when the value in gt_box[2],gt_box[3] is changed, the value in results['ann']['bbox'] is changed as well, so that each time the value is read, results['ann'][' bbox']] in results['ann']['bbox'][2] and results['ann']['bbox'][3] will be added once to the values in results['ann']['bbox'][0] and results['ann']['bbox'][1], resulting in a change in the value in results[' ann']['bbox'][2] and results['ann']['bbox'][3] keep getting bigger and bigger.
Originally, this would have made the bottom-right coordinates go beyond the image, but after using numpy.clip all the coordinates of the bottom-right corner of the ground truth become the bottom-right coordinates of the image. In this case, only the coordinates of the upper left corner are correct and the lower right corner is wrong. To resolve this, just change this code
gt_bbox = results['ann']['bbox']
to
gt_bbox = copy.deepcopy(results['ann']['bbox'])
If you think it's too long above,you can run these two test and compare the difference then you will understand this problem:
hi, I have found that the ground truth of the bboxes change over the process of training which will greatly affect the performance of the model. Because results['ann']['bbox'] is an array, assignment to gt_box is a shallow copy, and then when the value in gt_box[2],gt_box[3] is changed, the value in results['ann']['bbox'] is changed as well, so that each time the value is read, results['ann'][' bbox']] in results['ann']['bbox'][2] and results['ann']['bbox'][3] will be added once to the values in results['ann']['bbox'][0] and results['ann']['bbox'][1], resulting in a change in the value in results[' ann']['bbox'][2] and results['ann']['bbox'][3] keep getting bigger and bigger.
Originally, this would have made the bottom-right coordinates go beyond the image, but after using numpy.clip all the coordinates of the bottom-right corner of the ground truth become the bottom-right coordinates of the image. In this case, only the coordinates of the upper left corner are correct and the lower right corner is wrong. To resolve this, just change this code
to
If you think it's too long above,you can run these two test and compare the difference then you will understand this problem: