detailed dependancy for reproducing CLIPScore TIFA160

long8v commented 2 months ago

Hey there, thank you for great work! It really inspires my work. I tried to reproduce Table 12, specifically CLIPScore.

I found in CLIPScore repo, some packages (such as Pillow 8.4 vs 9.4 / torch 1.7 vs 2.0 / numpy 1.20.0 or higher) returns different value, subsequently return different correlation value. Also, clipscore employs prefix A photo depicts. However, I found TIFAv1 CLIPScore corresponds with without any prefix. When I reproduce with TIFA160 and it returns slightly different values (DSG report) 0.276 / 0.191

Pilllow==9.4.0
- prefix "A photo depicts ": 0.299 / 0.226
- prefix "": 0.279/ 0.209
Pillow==8.4.0
- prefix "A photo depicts ": 0.285 / 0.215
- prefix "": 0.266 / 0.199

It would be really helpful if you provide package dependancy you used for the paper and whether you used prefix when calculating CLIPScore. Thanks!

j-min commented 2 months ago

@wangsu-google-language - could you please check the package versions?

long8v commented 2 months ago

I found a mistake in my code. When I fix this, I successfully reproduce Table 12 but without template "A photo depicts ".

(DSG report)

0.276 / 0.191

(my result in PIL 8.4.0)

template "A photo depicts ": (spearman) 0.2990781700406022 (kendall) 0.2070432781113664
template "": (spearman) 0.2779111660592737 (kendall) 0.19136838513786647

Thanks!

j-min / DSG

detailed dependancy for reproducing CLIPScore TIFA160 #6