When running benchmarks on smaller versions of the billion-size datasets, the benchmarking tool seems to automatically run them on the full-size dataset if it is also present in the data folder. I believe the issue is in the following function, found in benchmark/datasets.py on line 176:
def get_dataset_fn(self):
fn = os.path.join(self.basedir, self.ds_fn)
if os.path.exists(fn):
return fn
if self.nb != 10**9:
fn += '.crop_nb_%d' % self.nb
return fn
else:
raise RuntimeError("file not found")
Modifying it as follows fixed the issue for me:
def get_dataset_fn(self):
fn = os.path.join(self.basedir, self.ds_fn)
if os.path.exists(fn):
if self.nb != 10**9:
fn += '.crop_nb_%d' % self.nb
return fn
else:
raise RuntimeError("file not found")
When running benchmarks on smaller versions of the billion-size datasets, the benchmarking tool seems to automatically run them on the full-size dataset if it is also present in the
data
folder. I believe the issue is in the following function, found inbenchmark/datasets.py
on line 176:Modifying it as follows fixed the issue for me: