Open BillyZhang24kobe opened 1 year ago
Hi @BillyZhang24kobe,
I am also looking into this :) It looks like the splits are defined in load_data. If my understanding is correct, to report numbers which would be comparable to Table 6, we need to use prep_geode_38 to generate the different per-region files, using 'index.csv' in place of the metadata file, using 'object' and 'file_path' instead of the 'script_name' and 'file_name' fields.
@vramaswamy94, thank you for contributing such a nice dataset :) Would you be able to confirm if my understanding is correct ? It would be great if you could provide the generated region-specific pickle files to avoid any risks of using a different train/val/test partition compared to your paper. Would you be able to share these ?
Have a great day !
Hello,
On the paper of this dataset you mentioned the data is split into several train, dev and test splits. I am wondering if you have some documentations on how exactly the splits are? I have downloaded the dataset from the official website (https://geodiverse-data-collection.cs.princeton.edu/), but it seems that there is only an 'index.csv' as a metadata file, which does not specify how the train-val-test data is split. Any pointers are welcomed! Thanks!