zmykevin / UC2

CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
MIT License
34 stars 3 forks source link

About the multi30k dataset #7

Closed ghchen18 closed 2 years ago

ghchen18 commented 2 years ago

Hi,

I have some questions about the multi30k dataset used in the UC2 paper.

(1) There are task1 and task2 in the official multi30k repo, do you use the train/test data from Task1?

(2) there are test2016/test2017/test2018 in Task1, which testset do you use in Table 1?

(3) For English-only fine-tune results (flickr30k) in Table 1, is the UC2 model fine-tuned on flickr30k training set or the concatenation of flickr30k + COCO training set? the flickr30k has only 29k image-text pairs, which seems small for fine-tuning the UC2.

Thank you.

zmykevin commented 2 years ago

Hello.

  1. We use Task 1 which is the task for multi-modal machine translation. Task 2 is for multilingual image captioning, which does not have data for cs and fr.
  2. We use tes2016, as this is the only test split that covers all the languages (en, fr, de and cs).
  3. We only fine-tune on the flickr30k, but for English, there are five captions associated with each image, and we use all of them. So the training image-text pairs are actually 29K *5. Same for german.
ghchen18 commented 2 years ago

Got it. Many thanks.