kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
http://kdexd.xyz/virtex
MIT License
556 stars 61 forks source link

VirTex for CC-like classification problems #6

Open Muennighoff opened 4 years ago

Muennighoff commented 4 years ago

Great project in the right direction, i.e. getting about the same results with less compute.

In your paper, you mention that you are discarding the text-head and only using the visual backbone and future research could leverage the text-head – do you think it could develop comparable performance to BERT-like models trained on text corpuses?

Also, I’m very interested in using VirTex for classification problems such as Conceptual Captions – Did you try / Would you estimate performance improvements using VirTex’s Visual Backbone + BERT or VirTex Visual & Textual (not sure if the latter would work) over VilBERT or VisualBERT?