xmed-lab / CLIP_Surgery

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
367 stars 26 forks source link

Results of multi-label recognition #20

Closed linyq2117 closed 1 year ago

linyq2117 commented 1 year ago

Thanks for your excellent work.

I failed to reproduce the multi-label recognition results in Table 7. For example, when I use CLIP ViT-B/16 with softmax function, I only got 35% mAP on NUS-Wide (42.85% in paper). I use the cls token of the original CLIP without feature surgery. Could you share the details and evaluation code of multi-label recognition?

Eli-YiLi commented 1 year ago

CLIP requires a logit scale (100) for softmax. Have you tried to add a logit scale like this: prob = (prob * 100).softmax(0)

The eval code will be released after acceptance, currently the major revision has been submitted.

linyq2117 commented 1 year ago

Thanks for your reply! It stems from the lack of feature normalization for my negligence.