salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.46k stars 193 forks source link

VQA dataset keys don't match code #70

Open shankyemcee opened 2 years ago

shankyemcee commented 2 years ago

Hi,

I am trying to run the VQA.py file for fine-tuning to the vqa v2 dataset. I am first trying on just the binary balanced abstract scenes due to their smaller size (https://visualqa.org/download.html). I have set the paths in the VQA.yaml file as mentioned. But when I go through the code in the vqa_dataset.py file, the keys being used dont match the keys in the dataset json files. For example here is a snippet of the questions json file:

{ "info": { "description": "This is Balanced Binary Abstract Scenes VQA dataset.", "url": "http://visualqa.org", "version": "1.0", "year": "2017", "contributor": "VQA Team", "date_created": "2017-03-09 14:27:27" }, "task_type": "Open-Ended", "data_type": "abstract_v002", "license": { "url": "http://creativecommons.org/licenses/by/4.0/", "name": "Creative Commons Attribution 4.0 International License" }, "data_subtype": "val2017", "questions": [ { "image_id": 28940, "question": "Is it daylight?", "question_id": 289402 }, { "image_id": 900289402, "question": "Is it daylight?", "question_id": 900289402 },

and here is a snippet of the code for processing data:


def __getitem__(self, index):    
        #print("acccessed")
        ann = self.ann[index]
        #print(ann)
        if ann['dataset']=='vqa':
            image_path = os.path.join(self.vqa_root,ann['image'])    
        elif ann['dataset']=='vg':
            image_path = os.path.join(self.vg_root,ann['image'])  

        image = Image.open(image_path).convert('RGB')   
        image = self.transform(image)          

        if self.split == 'test':
            question = pre_question(ann['question'],self.max_ques_words)   
            question_id = ann['question_id']            
            return image, question, question_id

        elif self.split=='train':                       

            question = pre_question(ann['question'],self.max_ques_words)        

            if ann['dataset']=='vqa':

                answer_weight = {}
                for answer in ann['answer']:
                    if answer in answer_weight.keys():
                        answer_weight[answer] += 1/len(ann['answer'])
                    else:
                        answer_weight[answer] = 1/len(ann['answer'])

                answers = list(answer_weight.keys())
                weights = list(answer_weight.values())

            elif ann['dataset']=='vg':
                answers = [ann['answer']]
                weights = [0.5]  

            answers = [answer+self.eos for answer in answers]

            return image, question, answers, weights

Can I know which dataset to use if this is not the one? I just want to try fine tuning the model on a small dataset to get it working so I can train it on another dataset. Thanks.

LiJunnan1992 commented 2 years ago

Hi, you can modify VQA.yaml to use your own training annotation.

shankyemcee commented 2 years ago

The problem is with the json files. The keys like ann['answer'] and ann['image'] cant be found. I am using OpenEnded_abstract_v002_train2017_questions.json as training data. Is there some other dataset that was used for VQA fine tuning other than this one? Many things are hardcoded and for now I just want to reproduce the results reported in the paper, so if you would kindly point me to the direction of the dataset, it would be appreciated.

LiJunnan1992 commented 2 years ago

Hi, you need to create your own json file from OpenEnded_abstract_v002_train2017_questions.json

shankyemcee commented 2 years ago

Got it. Thanks