harsha-simhadri / big-ann-benchmarks

Framework for evaluating ANNS algorithms on billion scale datasets.
https://big-ann-benchmarks.com
MIT License
313 stars 103 forks source link

Bug in get_dataset_fn() #262

Closed magdalendobson closed 5 months ago

magdalendobson commented 6 months ago

When running benchmarks on smaller versions of the billion-size datasets, the benchmarking tool seems to automatically run them on the full-size dataset if it is also present in the data folder. I believe the issue is in the following function, found in benchmark/datasets.py on line 176:

    def get_dataset_fn(self):
        fn = os.path.join(self.basedir, self.ds_fn)
        if os.path.exists(fn):
            return fn
        if self.nb != 10**9:
            fn += '.crop_nb_%d' % self.nb
            return fn
        else:
            raise RuntimeError("file not found")

Modifying it as follows fixed the issue for me:

    def get_dataset_fn(self):
        fn = os.path.join(self.basedir, self.ds_fn)
        if os.path.exists(fn):
            if self.nb != 10**9:
                fn += '.crop_nb_%d' % self.nb
            return fn
        else:
            raise RuntimeError("file not found")
maumueller commented 5 months ago

Sorry for the late reply, @magdalendobson, I missed the notification.

I fixed it in 89814e6.