harsha-simhadri / big-ann-benchmarks

Framework for evaluating ANNS algorithms on billion scale datasets.
https://big-ann-benchmarks.com
MIT License
356 stars 118 forks source link

Move filename generation outside if-else block in datasets.py #314

Closed magdalendobson closed 1 month ago

magdalendobson commented 1 month ago

In datasets.py, when calling prepare on a slice of a dataset, the filename was incorrectly being set to the base dataset filename when checking whether the file already existed. This caused download of smaller slices to be skipped if the base dataset file already existed in the user's data directory. This PR changes the filename to the cropped filename before checking for existence and thus makes sure the cropped file is downloaded where applicable.