Open kimihailv opened 1 year ago
Hi @kimihailv , we haven't evaluated this result, but yes the zero-shot performance is largely correlated with the training dataset size.
Zero-shot capabilities on other datasets (such as dtd, food101, caltech101, sun397 & etc) is much lower as compared to CLIP, MetaCLIP and open_clip methods.
Hello, I evaluated ALBEF 14M on ImageNetV2 classification task and it showed relatively low accuracy: top1 – 32.9, top5 - 60.7. How do you think what reasons of such results? Much smaller training dataset compared to CLIP?