levymsn / LaSCo

Official repository of the LaSCo dataset
https://vision.huji.ac.il/lasco
MIT License
7 stars 0 forks source link

Problem ablou corpus.json #2

Closed HaoliangZhou closed 5 months ago

HaoliangZhou commented 5 months ago

Firstly, I would like to extend my appreciation for the excellent work done on this project. However, during my exploration of the provided data files, I met some problems that I hope to clarify with your assistance.

I noticed that in the dataset, there exists a file named "lasco_val.json" which contains 30,037 sets of data. Concurrently, there is another file "lasco_val_corpus.json" which houses 39,826 distinct corpus entries.

My questions are as follows: (1) What is the specific purpose of the "lasco_val_corpus.json" file? Could you kindly explain what the count of 39,826 corpus entities represents in the context of this dataset? (2) Furthermore, could you shed light on the relationship between "lasco_val_corpus.json" and "lasco_val.json"? How are these two datasets interconnected or used together in the intended workflow?

Thank you in advance for taking the time to address my queries. I am looking forward to a better understanding of these data structures to facilitate more effective use of the resources.

levymsn commented 5 months ago

Hi Zhou, The retrieval task is defined as the process of identifying a target image within a large corpus of images that matches a given query (of image+text). In the context of LaSCo, the queries and their corresponding target images are annotated in 'lasco_val.json'. To evaluate a method using 'lasco_val.json', the search space (or corpus) is defined in 'lasco_val_corpus.json'.

Specifically, for a given query extracted from 'lasco_val.json', the method should accurately locate the target image by considering all potential candidates listed in 'lasco_val_corpus.json'.

HaoliangZhou commented 5 months ago

got it! Thank you so much! ^_^