woojeongjin/FewVLM - Githubissues

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

Paper (ACL 2022)

This repository contains the implementation of FewVLM described in the paper. Codes are based on VL-T5

Installation

pip install -r requirements.txt
python -c "import language_evaluation; language_evaluation.download('coco')"

Datasets

Datasets can be downloaded from Google Drive
For other datasets, we used datasets from VL-T5 repository. Please refer to VL-T5 repository for download.

Pre-trained checkpoints

We released the pre-trained checkpoints: base and large

Pre-training

# Pre-train with 8 GPUs
bash scripts/pretrain.sh 8

Zero/few-shot Learning

All commands are runnable on a single GPU.

VQA

# for few-shot
bash scripts/VQA.sh 0 VQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/VQA.sh 0 VQA --test_only --prompt 3

OKVQA

# for few-shot
bash scripts/OKVQA.sh 0 OKVQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/OKVQA.sh 0 OKVQA --test_only --prompt 3

GQA

# for few-shot
bash scripts/GQA.sh 0 GQA --subsample --dataseed 42 --num_data 16 --test_only --prompt 3

# for zero-shot 
bash scripts/GQA.sh 0 GQA --test_only --prompt 3

Flickr30k

# for few-shot
bash scripts/flickr.sh 0 flickr --subsample --dataseed 42 --num_data 16 --prefix image 

# for zero-shot 
bash scripts/flickr.sh 0 flickr --prefix image --test_only

Nocaps

# for few-shot
bash scripts/nocaps.sh 0 nocaps --subsample --dataseed 42 --num_data 16 --prefix image 

# for zero-shot 
bash scripts/nocaps.sh 0 nocaps --prefix image --test_only

Some important command line arguments are listed as follows:

Arg	Values	Description	Notes
`--load`	path for trained checkpoints	load a checkpoint
`--dataseed`	{0, 42, 9595,...}	Random seed for data shuffling	default=42
`--seed`	{0, 42, 9595,...}	Random seed for parameter shuffling	default=9595
`--subsample`	store_true	Subsample train and val sets for few-shot learning
`--num_data`	{16, 40, ...}	Number of subsamples for train and val sets	default=16
`--test_only`	store_true	Run test without training
`--prompt`	{0, 1, 2, 3}	Prompts for VQA	default=0, 0: no prompt, 1: '[Q] ', 2: 'question: [Q] answer:', 3: 'question: [Q] answer: '
`--prefix`	{None, 'image', 'picture', 'photo'}	Prompts for captioning	Default=None, 'image': 'an image of', 'picture': 'a picture of', 'photo': 'a photo of'
`--backbone`	{'t5-base', 't5-large'}	Backbone architecture	default='t5-base'

woojeongjin / FewVLM

readme