mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
MIT License
261 stars 15 forks source link

eval coco order and flickr order #11

Closed lezhang7 closed 1 year ago

lezhang7 commented 1 year ago

for loading dataset, small bug where it should be enumerate(tqdm(self.annotation)). And one question, why it takes so much time loading coco order dataset?

Screenshot 2023-03-25 at 12 10 05 PM
vinid commented 1 year ago

Hi @Magiccircuit,

it takes time because we are using spacy to generate all the possible perturbations for the captions! You should be able to easily wrap it ina multiprocessing.Pool this if you need to speed it up!

vinid commented 1 year ago

let me know if you need help on this!

vinid commented 1 year ago

Hello, I just pushed an updated that should make the generation of the shuffled options faster, let me know if this works