During the fine-tuning on the image captioning task, Did you use any pre-training task (for e.g., AKPM, TIM and AMLM) along with the fashion captioning task i.e., given an image ( i.e., sequence of image patches generated by "Kaleido Patch Generator") predict the corresponding caption?
6
During the fine-tuning on the image captioning task, Did you use any pre-training task (for e.g., AKPM, TIM and AMLM) along with the fashion captioning task i.e., given an image ( i.e., sequence of image patches generated by "Kaleido Patch Generator") predict the corresponding caption?