princetonvisualai / geode_dataset

4 stars 2 forks source link

Train, dev, test split for the data #3

Open BillyZhang24kobe opened 11 months ago

BillyZhang24kobe commented 11 months ago

Hello,

On the paper of this dataset you mentioned the data is split into several train, dev and test splits. I am wondering if you have some documentations on how exactly the splits are? I have downloaded the dataset from the official website (https://geodiverse-data-collection.cs.princeton.edu/), but it seems that there is only an 'index.csv' as a metadata file, which does not specify how the train-val-test data is split. Any pointers are welcomed! Thanks!

hassony2 commented 9 months ago

Hi @BillyZhang24kobe,

I am also looking into this :) It looks like the splits are defined in load_data. If my understanding is correct, to report numbers which would be comparable to Table 6, we need to use prep_geode_38 to generate the different per-region files, using 'index.csv' in place of the metadata file, using 'object' and 'file_path' instead of the 'script_name' and 'file_name' fields.

@vramaswamy94, thank you for contributing such a nice dataset :) Would you be able to confirm if my understanding is correct ? It would be great if you could provide the generated region-specific pickle files to avoid any risks of using a different train/val/test partition compared to your paper. Would you be able to share these ?

Have a great day !