woojeongjin / FewVLM

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)
40 stars 5 forks source link

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

Paper (ACL 2022)

This repository contains the implementation of FewVLM described in the paper. Codes are based on VL-T5

Installation

pip install -r requirements.txt
python -c "import language_evaluation; language_evaluation.download('coco')"

Datasets

Pre-trained checkpoints

Pre-training

# Pre-train with 8 GPUs
bash scripts/pretrain.sh 8 

Zero/few-shot Learning

All commands are runnable on a single GPU.

VQA

# for few-shot
bash scripts/VQA.sh 0 VQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/VQA.sh 0 VQA --test_only --prompt 3

OKVQA

# for few-shot
bash scripts/OKVQA.sh 0 OKVQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/OKVQA.sh 0 OKVQA --test_only --prompt 3

GQA

# for few-shot
bash scripts/GQA.sh 0 GQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/GQA.sh 0 GQA --test_only --prompt 3

Flickr30k

# for few-shot
bash scripts/flickr.sh 0 flickr --subsample --dataseed 42 --num_data 16 --prefix image 

# for zero-shot 
bash scripts/flickr.sh 0 flickr --prefix image --test_only 

Nocaps

# for few-shot
bash scripts/nocaps.sh 0 nocaps --subsample --dataseed 42 --num_data 16 --prefix image 

# for zero-shot 
bash scripts/nocaps.sh 0 nocaps --prefix image --test_only 

Some important command line arguments are listed as follows:

Arg Values Description Notes
--load path for trained checkpoints load a checkpoint
--dataseed {0, 42, 9595,...} Random seed for data shuffling default=42
--seed {0, 42, 9595,...} Random seed for parameter shuffling default=9595
--subsample store_true Subsample train and val sets for few-shot learning
--num_data {16, 40, ...} Number of subsamples for train and val sets default=16
--test_only store_true Run test without training
--prompt {0, 1, 2, 3} Prompts for VQA default=0, 0: no prompt, 1: '[Q] ', 2: 'question: [Q] answer:', 3: 'question: [Q] answer: '
--prefix {None, 'image', 'picture', 'photo'} Prompts for captioning Default=None, 'image': 'an image of', 'picture': 'a picture of', 'photo': 'a photo of'
--backbone {'t5-base', 't5-large'} Backbone architecture default='t5-base'