weixi-feng / LayoutGPT

Official repo for LayoutGPT
MIT License
300 stars 20 forks source link

[DOUBT] Issue in Bounding Boxes of NSR-1k dataset #5

Closed alphacoder01 closed 1 year ago

alphacoder01 commented 1 year ago

Hi, I was visualising the bboxes of the NSR-1K dataset, the boxes seems incorrect compared to the original-coco boxes.

counting = read_json("LayoutGPT/dataset/NSR-1K/counting/counting.train.json")
caps = read_json('COCO/annotations/captions_train2017.json')
image_info = caps['images']

def visualize_data(idx):
    sample = counting[idx]
    img_id = sample['image_id']
    for k in image_info:
        if k['id'] == img_id:
            img_name = k['file_name']
            H = k['height']
            W = k['width']
    print(img_id)
    img = Image.open(f'COCO/train2017/{img_name}').convert('RGB')
    draw = ImageDraw.Draw(img)
    for lst in sample['object_list']:
        text = lst[0]
        x,y,w,h = lst[1]
        x = (x*W)
        y = (y*H)
        w = (w*W)
        h = (h*H)
        print(x,y,w,h)
        draw.rectangle([(x,y), (x+w, y+h)], outline=(255,0,0))
        print(text)
        draw.text((x,y-10), text, (0,0,0))
    plt.imshow(img)
    plt.show()

The boxes for image_id = 45247 in counting.train.json files in [x,y,w,h] format (assumed) are: [0.75 ,93.37, 273.27, 183.09] and [280.38, 112.05, 202.96, 147.68] whereas in the original annotations are: [0.75, 56.71, 273.27, 274.91] and [280.38, 84.75, 202.96, 221.75]

The x-axes matches completely but the y-axes is wrong in almost all the images. Here's an example I might be doing this wrong, any help is appreciated!!

Red boxes are taken from NSR-1K dataset, Green are from MSCOCO GT zebra2

VegB commented 1 year ago

Hi Ashish,

Thanks for asking!

Below is the preprocessing script we used. Basically, we pad each image into a square to preserve the ratio of the bbox:

# the original image shape
width = annotations['width']
height = annotations['height']

# add padding to the image to ensure it becomes a square
side_len = max(width, height)
x_offset = (side_len - width) / 2  # padding margin
y_offset = (side_len - height) / 2

# rescaling (x, y, w, h) to (0, 1)
for obj_info in annotations['object_list']:
    x, y, w, h = obj_info['bbox']
    x, y, w, h = float(x+x_offset)/side_len, float(y+y_offset)/side_len, float(w)/side_len, float(h)/side_len

Let me know if this answers your question :)