Closed lezhang7 closed 1 year ago
I was also confused, in table2 you report clip on VG_Relation scores 59 while in table6 It becomes 63, am I missing something?
Hello!!
which model is this negclip/checkpoints/epoch_0.p
? how was it trained? with which parameters?
0.63 is for CLIP-FT (the one fine-tuned on mscoco), CLIP score is still 0.59.
thanks for quick response, i just rename the checkpoint from your gdown script, however, I tried this for dataset in VG_Relation VG_Attribution do python3 main_aro.py --dataset=$dataset --model-name=NegCLIP --device=cuda --batch-size=$bs done
the results are still the same, 68.11 and 42.17
by the way, i calculate the acc by acc=(df['Accuracy']*df['Count']).mean()
don't know if it should be this way, but I got 59 on VG-R for clip which matches the reported scores, so guess this is correct
Hello!!
which model is this
negclip/checkpoints/epoch_0.p
? how was it trained? with which parameters?0.63 is for CLIP-FT (the one fine-tuned on mscoco), CLIP score is still 0.59.
but when you take a look at this one, it doesn't match
oh you are right, that is a typo from the arxiv version, we should update that. This is the camera ready version of the paper: https://openreview.net/forum?id=KRLUvxh8uaX
(preparing an answer for the reproducibility soon)
i see, you compute marco accuracy instead of accuracy, could you share the code for computing the macro accuracy?
thank you, still, when i test the openai-clip:ViT-B/32, the macro accuracy on VG-Relation is 63, which matches the preprint instead of camera ready version, suggesting 59 is calculated by acc=(df['Accuracy']*df['Count']).mean()
while 63 is acc=df['Accuracy'].mean()
, I guess you should report 63 instead of 59
just run the colab with openAI's CLIP and got 0.59, could you try to see what's missing starting from that?
just run the colab with openAI's CLIP and got 0.59, could you try to see what's missing starting from that?
why use symmetric
df = pd.DataFrame(vgr_records) df = df[~df.Relation.isin(symmetric)] print(f"VG-Relation Macro Accuracy: {df.Accuracy.mean()}")
instead of directly take the mean?
Note that we don't use symmetric relations.
The problem is that if a relation is symmetric you have that r(X,Y) = r(Y,X).
For example given and image of a cat close to a dog, both "close(Cat,Dog)" and "close (Dog,Cat)" are true. Models would just pick one of the captions at random and thus it's not a very informative relation to study (unless maybe for some bias analysis). Hence, we drop symmetric relationships.
Closing this for now, let me know if you have other questions!
Hi @vinid I was trying out the colab you shared above, but i changed the model to NegCLIP. In particular i changed one line to
model, preprocess = get_model(model_name="NegCLIP", device="cuda", root_dir=root_dir)
I am getting
VG-Relation Macro Accuracy: 0.8021811864440539 VG-Attribution Macro Accuracy: 0.7055937135374111
Just wanted to confirm if this is correct, especially the Relation accuracy.
Hello!
Before computing the scores, did you also apply
df = df[df["Count"] > 9]
this is a commented instruction
I didn't try that earlier. Uncommenting it gives:
VG-Relation Macro Accuracy: 0.8038109510723876
the difference is small but let me look into this
yup, not a big issue, but just wanted to confirm if this is the correct number.
Thanks for looking into it
yea makes total sense, thanks for pointing this out
Hey, I ran the same and got with
df = df[df["Count"] > 9
VG-Relation Macro Accuracy: 0.803816692506885
Commenting it gives
VG-Relation Macro Accuracy: 0.8021892603363159
Also, on OpenAI Clip I get
df = df[df["Count"] > 9
-> VG-Relation Macro Accuracy: 0.5923217479726929
Commenting it gives -> VG-Relation Macro Accuracy: 0.5946534905762514
Hi! thanks! The CLIP one matches the one in the paper
Also, funnily if you use torch tensors (and not numpy) +cuda to compute the scores and then the accuracy, you get
VG-Relation Macro Accuracy for CLIP of 0.599128631916311
with df = df[df["Count"] > 9
Thank you all! I think, somehow, most of the comments above are correct. 😄
First of all, in the paper, if you look at Table 2 we wrote 0.80 for VG-R, vs in Table 3 we wrote 0.81. This is an honest mistake, we are sorry about this.
As for the reason, I think @DianeBouchacourt is spot on here. This appears to be due to the non-determinism in cuda. (e.g. see here or official pytorch docs). If you do the computations in cuda you get that ~0.002-5ish difference in perf, and our legacy code (before cleaning and releasing) is doing that.
I think overall nondeterminism can get pretty tricky in the context of the VG-R dataset. The embeddings can be pretty close, and minor differences in embeddings due to nondeterminism can lead to differences around ~0.002-5
hi, when I try to reproduce the results on ARO, I can't get the scores, the code is
for dataset in VG_Relation VG_Attribution do for resume in scratch/open_clip/src/Outputs/negclip/checkpoints/epoch_0.pt do python3 main_aro.py --dataset=$dataset --model-name=$model --resume=$resume --batch-size=$bs --device=cuda --download done done
and I just got VG_Relation 68.11 VG_Attribution 42.16 instead of 81 and 71 as reported in table6