yizhen-zhang / VG-Bert

Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"
21 stars 4 forks source link