mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
MIT License
222 stars 14 forks source link

Why does `visual_genome_relation.json` still contain symmetric relations? #37

Closed lcxrocks closed 5 months ago

lcxrocks commented 5 months ago

First of all, thank you for your outstanding work!

I noticed in your paper, specifically in A.1, Step 6, that you addressed the removal of symmetric relations in the VGR benchmark. However, upon reviewing the VGR json file, I found numerous instances of symmetric relations such as 'near', indicating that there are 23937 test cases in the VGR benchmark including symmetric relations, rather than excluding them, which is a little different from the paper. Is this caused by any of my potential misunderstandings?

I understand that you performed post-processing in Replicate ARO! VG-Relation, VG-Attribution.ipynb to calculate a macro accuracy for VGR. I think this is the post-process you mentioned in your paper. Therefore, my question is, why wasn't the removal of symmetric relations directly applied to the visual_genome_relation.json file?

Thank you for taking the time to address my query!

vinid commented 5 months ago

Hi! Thanks so much!

No real reason, we just kept them in the original dataset since they were part of the original extraction process. As you pointed out, we drop them when computing the results.

lcxrocks commented 5 months ago

Thank you for your quick response! So, if I've got this right, none of the metrics in the paper take into consideration of those symmetric relations from the visual_genome_relation.json file, is that correct?

vinid commented 5 months ago

Yes! You got it right :) metrics do not take in consideration those relationships

lcxrocks commented 5 months ago

Thank you so much! (Still think that we should remove those symmetric relations from the .json file directly, lol.