mczhuge / Kaleido-BERT

💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
MIT License
263 stars 19 forks source link

Finetuning of Kaleido-BERT for Fashion Captioning #6

Closed gourango01 closed 3 years ago

gourango01 commented 3 years ago

Thanks for sharing this interesting work. Would you please share how "Kaleido-BERT" has been fine-tuned on captioning task? Have you used separate decoder for generation or "Kaleido-BERT" encoder only?

mczhuge commented 3 years ago

Encoder only, and we use a very simple method: [Mask] all then predict them. You can easily defeat the existing result if trying to use a useful decoder.

gourango01 commented 3 years ago

During the fine-tuning on the image captioning task, Did you use any pre-training task (for e.g., AKPM, TIM and AMLM) along with the fashion captioning task i.e., given an image ( i.e., sequence of image patches generated by "Kaleido Patch Generator") predict the corresponding caption?