VirTex for CC-like classification problems

Great project in the right direction, i.e. getting about the same results with less compute.

In your paper, you mention that you are discarding the text-head and only using the visual backbone and future research could leverage the text-head – do you think it could develop comparable performance to BERT-like models trained on text corpuses?

Also, I’m very interested in using VirTex for classification problems such as Conceptual Captions – Did you try / Would you estimate performance improvements using VirTex’s Visual Backbone + BERT or VirTex Visual & Textual (not sure if the latter would work) over VilBERT or VisualBERT?

kdexd / virtex

VirTex for CC-like classification problems #6