mshukor / ViCHA

[BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"
MIT License
52 stars 1 forks source link

Retrieval results #5

Open kimihailv opened 1 year ago

kimihailv commented 1 year ago

Hello, in table 3 you compare ViCHa with ALBEF (1.1M), the label of this table: "Zero-shot Comparison with SOTA on Flickr30K (after fine-tuning ViCHA on COCO) and COCO (pretrained model only)". Was ALBEF fine-tuned on COCO before evaluation on Flickr30K too?

mshukor commented 1 year ago

Hello, yes, both are finetuned on COCO. They are comparable on all benchmarks.