wangf3014 / SCLIP

Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
110 stars 9 forks source link