Closed YashRunwal closed 3 years ago
In my project, an input image will be resized to a square image with same width and height, so the parameter "img_size" in gt_creator function is a int type, not a list type, but I notice that you have fixed this trouble.
As for the error you meet, I think it is because that you might forget to normalize your bbox by img_size. This is a common practice that the [x1, y1, x2, y2] of [cx, cy, w, h] shoule be normalized to the range of [0, 1]. You might not to do this, so you got a very big grid_x and grid_y higher than [128, 384].
So I just have to divide the x values by width and y values by height of the image to normalize it, right? Also, If I normalize the annotations, I don't need to use any augmentations, right?
Yes. In my project, you must normalize bboxes.
You still use augmentations during training stage, else you will get poor performance.
The problem with my training images is that they are not normal RGB or Grayscale images. They are raw images and hence I still haven't figured out the augmentation techniques for those.
@yjh0410 Can you perhaps explain what this gt_creator
does? I went through the code but it is a little bit difficult to understand its functionality as there aren't any comments. I would happily add comments to the code.
Also, for anyone who had doubts about normalizing bounding boxes: You can use the following function. You need to change a few things if it is a stand-alone function, i.e. not written as a class method.
def _normalize_bbox(self, image, tgts):
"""
Normalize bounding boxes in the range [0, 1]
:param image: np_image: shape: (512, 1536)
:param tgts: annotations include [[xmin, ymin, xmax, ymax, class_id]]
:return: normalized bounding boxes
"""
height, width = image.shape[0], image.shape[1] # (512, 1536)
# tgts[:, :4]
targets = []
for annot in tgts:
xmin = annot[0]/width
ymin = annot[1]/height
xmax = annot[2]/width
ymax = annot[3]/height
class_id = annot[4]
targets.append([xmin, ymin, xmax, ymax, class_id])
return np.squeeze(np.array(targets))
Hello,
My Input image (grayscale) size is (512, 1536). I created my own dataset class. The targets it returns is:
As given in the train.py script, I selected the stride of 4 to use in
gt_creator
functiongt_creator((512, 1536), stride=4, num_classes=8, dataset[100])
But it gives me the following error:
Therefore I tried to debug this by printing the results of the
generate_txtytwth
function and they are:(360192, 34560, 0.0, 0.0, 12.173218820659095, 10.140297294614152, -99790.0, 12902, 12902, 686, 171, 1190, 369)
However, the gt_tensor variable inside the
gt_creator
function has shape:(128, 384, 17)
Any idea as to why this is happening? What needs to be changed? I would prefer not to use any augmentation techniques.