salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.46k stars 193 forks source link

obout some json files #64

Closed TungWg closed 2 years ago

TungWg commented 2 years ago

Hi, thanks for the excellent work. I would like to know how to generate these json file refcoco+_train.json, refcoco+_val.json, refcoco+_train.json, refcoco+_test.json in data.tar.gz. How to get those json files for refcoco and Refcocog datasets?

LiJunnan1992 commented 2 years ago

Hi, they are generated from the original refcoco+ annotation, by converting the bounding boxes into patch masks.

TungWg commented 2 years ago

Thank you for your reply, dose original refcoco+ annotation mean instances. json and refs(unc).p? Are these json files or any converting scripts available?

LiJunnan1992 commented 2 years ago

Here is the code snippet for the conversion. I used the official refer API to load the original annotation.

import torch
import os
import json
from refer import REFER
import numpy as np

data_root = './data'  # contains refclef, refcoco, refcoco+, refcocog and images
dataset = 'refcoco+'
splitBy = 'unc'
refer = REFER(data_root, dataset, splitBy)

split = 'test'
ref_ids = refer.getRefIds(split=split)

annotations = []

dim_w, dim_h = 384, 384
patch_size = 32
n_patch_w, n_patch_h = dim_w//patch_size, dim_h//patch_size

refer.getRefIds()
for ref_id in ref_ids:

    ref = refer.Refs[ref_id]      
    image = refer.Imgs[ref['image_id']]

    width, height = image['width'], image['height']   
    w_step = width/n_patch_w
    h_step = height/n_patch_h   
    patch_area = height*width/(n_patch_w*n_patch_h)    

    mask = refer.getMask(ref)['mask']

    patch = []
    for i in range(n_patch_h):
        for j in range(n_patch_w):
            y0 = max(0,round(i*h_step))
            y1 = min(height, round((i+1)*h_step))
            x0 = max(0,round(j*w_step))
            x1 = min(width, round((j+1)*w_step))
            submask = mask[int(y0):int(y1),int(x0):int(x1)]
            patch.append(submask.sum()/patch_area)    

    text = [sentence['sent'] for sentence in ref['sentences']]
    imgPath = os.path.join('/export/share/datasets/vision/coco/images/train2014', image['file_name'])
    annotation = {'image': imgPath, 'text':text, 'patch':patch, 'type':'ref', 'ref_id':ref['ref_id']}
    annotations.append(annotation)

for ann in annotations:
    ann['patch'] = [torch.Tensor(ann['patch']) for n in range(len(ann['text']))]

torch.save(annotations,'refcoco+_%s.pth'%split)
TungWg commented 2 years ago

OK, thanks a lot!