harsha-simhadri / big-ann-benchmarks

Framework for evaluating ANNS algorithms on billion scale datasets.
https://big-ann-benchmarks.com
MIT License
356 stars 118 forks source link

Fix issue with dataset filenames #292

Closed magdalendobson closed 6 months ago

magdalendobson commented 6 months ago

The generic function get_dataset_fn() implemented in the DatasetCompetitionFormat class in benchmark/datasets.py did not work properly, as it contained some code that was specific to the billion-size datasets, and caused some runs to fail as they searched for the wrong dataset name. I have amended that class to use a more general function, and I implemented a subclass BillionScaleDatasetCompetitionFormat that took care of the filenames for the billion-size datasets. Now the get_dataset_fn() method should work properly for every dataset.

Requesting @harsha-simhadri or @maumueller review.

magdalendobson commented 6 months ago

Also requesting @mdouze and @ingberam as possible reviewers.

maumueller commented 6 months ago

Thanks @magdalendobson, this looks like a good solution to me.