hsiehjackson / Mr.Right

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text
Creative Commons Attribution Share Alike 4.0 International
22 stars 3 forks source link

Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text

Mr. Right is a novel retrieval dataset containing multimodal documents (images and texts) and multi-related queries. It also provides a multimodal framework for evaluation and compares with previous text-to-text retrieval models and image-text retrieval models. Dataset and model checkpoints are released.

For more details, please checkout our Mr. Right paper.

Mr. Right Dataset

Dataest

Preprocess

  1. Download Mr. Right dataset.
  2. Extract the Mr_Right.tar.gz to data/ directoy.
  3. Download images and create path for each image (Be sure that your storage is more than 1.5TB)
  4. Add image path to your json files: {id:0, ......,"doc_image": "xxx.jpg"}, including multimodal_documents.json, multimodal_train_pairs.json, and multimodal_finetune_pairs.json
    bash download_dataset.sh

Model Checkpoint

We train our model based on ALBEF, METER, and ViLT.

bash ./checkpoints/download_checkpoints.sh

Edit Configs

Fine-tune Multimodal model

CUDA_VISIBLE_DEVICES=0 python main.py \
--num_gpus [number of gpus] \
--num_workers [number of workers] \
--wandb_task_name [Name of task] \
--batch_size 16 \ 
--pretrain [ALBEF | ViLT | METER] \ 
--embeds_feats [avg | cls] \ 
--pl_checkpoint [path for resumed model] \
--save_checkpoint [path for saving checkpoints] \
--neg_matching
--ctx_prediction
--re_ranking

Evaluate

We evaluate our models on a V100 32GB GPU. However, when we calculate the score of TR, IR, and MR simultaneously, the memory size is not enough. Therefore, we store the embeddings checkpoint and calculate the score seperately.

# Run model
CUDA_VISIBLE_DEVICES=0 python main.py \
--num_gpus 1 \
--mode test \
--wandb_task_name [Name of task] \ 
--pickle_output [Directory of testing pickle files] \
--test_output [Json results of model] \
--batch_size 128 \ 
--pretrain [ALBEF | ViLT | METER] \ 
--pl_checkpoint checkpoints/[ albef.ckpt | vilt.ckpt | meter.ckpt] \

# Calculate the score
python compute_pickle.py \
--pickle_input [Embeddings of different retrieval tasks]

Benchmark

Mr. Right Benchmark

License

This data is available under the Creative Commons Attribution Share Alike 4.0 license.

Contact

For any questions please contact r09944010@ntu.edu.tw or c2hsieh@ucsd.edu