j-min / DSG

Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)
https://google.github.io/dsg
74 stars 5 forks source link

detailed dependancy for reproducing CLIPScore TIFA160 #6

Closed long8v closed 2 months ago

long8v commented 2 months ago

Hey there, thank you for great work! It really inspires my work. I tried to reproduce Table 12, specifically CLIPScore.

image

I found in CLIPScore repo, some packages (such as Pillow 8.4 vs 9.4 / torch 1.7 vs 2.0 / numpy 1.20.0 or higher) returns different value, subsequently return different correlation value. Also, clipscore employs prefix A photo depicts. However, I found TIFAv1 CLIPScore corresponds with without any prefix. When I reproduce with TIFA160 and it returns slightly different values (DSG report) 0.276 / 0.191

It would be really helpful if you provide package dependancy you used for the paper and whether you used prefix when calculating CLIPScore. Thanks!

j-min commented 2 months ago

@wangsu-google-language - could you please check the package versions?

long8v commented 2 months ago

I found a mistake in my code. When I fix this, I successfully reproduce Table 12 but without template "A photo depicts ".

(DSG report)

(my result in PIL 8.4.0)

Thanks!