Learning Visual Representations with Caption Annotations

書誌情報

2020arxiv

image-conditioned mas language modeling(ICMLM)の提案論文。一部分が歯抜けになっている説明文と画像が与えられ、画像から歯抜けの箇所を推定するするタスク。これまでのVQAのようなタスクよりもより注視箇所が明確になる

気になった論文 G.: A simple framework for contrastive learning of visual representations.

On the variance of the adaptive learning rate and beyond

Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: Proc. CVPR (2012) 2