Bug in get_dataset_fn()

When running benchmarks on smaller versions of the billion-size datasets, the benchmarking tool seems to automatically run them on the full-size dataset if it is also present in the data folder. I believe the issue is in the following function, found in benchmark/datasets.py on line 176:

    def get_dataset_fn(self):
        fn = os.path.join(self.basedir, self.ds_fn)
        if os.path.exists(fn):
            return fn
        if self.nb != 10**9:
            fn += '.crop_nb_%d' % self.nb
            return fn
        else:
            raise RuntimeError("file not found")

Modifying it as follows fixed the issue for me:

    def get_dataset_fn(self):
        fn = os.path.join(self.basedir, self.ds_fn)
        if os.path.exists(fn):
            if self.nb != 10**9:
                fn += '.crop_nb_%d' % self.nb
            return fn
        else:
            raise RuntimeError("file not found")

harsha-simhadri / big-ann-benchmarks

Bug in get_dataset_fn() #262