layumi / Image-Text-Embedding

TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://arxiv.org/abs/1711.05535
MIT License
287 stars 73 forks source link
bidirectional-retrieval cross-modal-retrieval cross-modality image-retrieval image-search language-retrieval matconvnet matlab person-reidentification visual-semantic

Dual-Path Convolutional Image-Text Embedding

[Paper] [Slide] :arrow_left: I recommend to check this slide first. :arrow_left:

This repository contains the code for our paper Dual-Path Convolutional Image-Text Embedding. Thank you for your kindly attention.

Some News

5 Sep 2021 I love the sentence that 'Define yourself via tell what you are different from others' (exemplar SVM), which also is the spirit of the instance loss.

11 June 2020 People live in the 3D world. We release one new person re-id code Person Re-identification in the 3D Space, which conduct representation learning in the 3D space. You are welcomed to check out it.

30 April 2020 We have won the AICity Challenge 2020 in CVPR 2020, yielding the 1st Place Submission to the retrieval track :red_car:. Check out here.

01 March 2020 We release one new image retrieval dataset, called University-1652, for drone-view target localization and drone navigation :helicopter:. It has a similar setting with the person re-ID. You are welcomed to check out it.

What's New: We updated the paper to the second version, adding more illustration about the mechanism of the proposed instance loss.

Install Matconvnet

I have included my Matconvnet in this repo, so you do not need to download it again.You just need to uncomment and modify some lines in gpu_compile.m and run it in Matlab. Try it~ (The code does not support cudnn 6.0. You may just turn off the Enablecudnn or try cudnn5.1)

If you fail in compilation, you may refer to http://www.vlfeat.org/matconvnet/install/

Prepocess Datasets

  1. Extract wrod2vec weights. Follow the instruction in ./word2vector_matlab;

  2. Prepocess the dataset. Follow the instruction in ./dataset. You can choose one dataset to run. Three datasets need different prepocessing. I write the instruction for Flickr30k, MSCOCO and CUHK-PEDES.

  3. Download the model pre-trained on ImageNet. And put the model into './data'.

    (bash) wget http://www.vlfeat.org/matconvnet/models/imagenet-resnet-50-dag.mat

    Alternatively, you may try VGG16 or VGG19.

You may have a different split with me. (Sorry, this is my fault. I used a random split.) Just for a backup, this is the dictionary archive used in the paper.

Trained Model

You may download the three trained models from GoogleDrive new GoogleDrive.

Train

Run train_flickr_word_Rankloss_shift_hard for Stage II training.

Run train_coco_Rankloss_shift_hard.m for Stage II training.

Run train_cuhk_word_Rankloss_shift for Stage II training.

Test

Select one model and have fun!

CheckList

Citation

@article{zheng2017dual,
  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  doi={10.1145/3383184},
  note={\mbox{doi}:\url{10.1145/3383184}},
  volume={16},
  number={2},
  pages={1--23},
  year={2020},
  publisher={ACM New York, NY, USA}
}