zhixiongz / CLIP4CMR

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval
42 stars 3 forks source link