I have some questions about the multi30k dataset used in the UC2 paper.
(1) There are task1 and task2 in the official multi30k repo, do you use the train/test data from Task1?
(2) there are test2016/test2017/test2018 in Task1, which testset do you use in Table 1?
(3) For English-only fine-tune results (flickr30k) in Table 1, is the UC2 model fine-tuned on flickr30k training set or the concatenation of flickr30k + COCO training set? the flickr30k has only 29k image-text pairs, which seems small for fine-tuning the UC2.
We use Task 1 which is the task for multi-modal machine translation. Task 2 is for multilingual image captioning, which does not have data for cs and fr.
We use tes2016, as this is the only test split that covers all the languages (en, fr, de and cs).
We only fine-tune on the flickr30k, but for English, there are five captions associated with each image, and we use all of them. So the training image-text pairs are actually 29K *5. Same for german.
Hi,
I have some questions about the multi30k dataset used in the UC2 paper.
(1) There are task1 and task2 in the official multi30k repo, do you use the train/test data from Task1?
(2) there are test2016/test2017/test2018 in Task1, which testset do you use in Table 1?
(3) For English-only fine-tune results (flickr30k) in Table 1, is the UC2 model fine-tuned on flickr30k training set or the concatenation of flickr30k + COCO training set? the flickr30k has only 29k image-text pairs, which seems small for fine-tuning the UC2.
Thank you.