mczhuge / Kaleido-BERT

💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
MIT License
263 stars 19 forks source link

Fashion Captioning using Kaleido-BERT and Fashion-BERT #12

Closed Surabhi-Kumari closed 2 years ago

Surabhi-Kumari commented 2 years ago

Hi, I have gone through your code. Very interesting work. Can you please explain the input to calculate input MLM logits for caption generation? I have tried input in the formats: 1. image_feature,[SEP], [MASK],[PAD]...[PAD] 2. image_feature,[CLS], [MASK],[PAD]....[PAD] 3. [CLS], [MASK],[PAD]...[PAD],[SEP],image_feature; this will be in loop. Which one is the correct format? Thanks!

mczhuge commented 2 years ago

Thanks for your interest.

I doubt that any of the three options are incorrect; perhaps you can refer to: (https://github.com/mczhuge/Kaleido-BERT/blob/main/scripts/finetune_main.py#78)