The generic function get_dataset_fn() implemented in the DatasetCompetitionFormat class in benchmark/datasets.py did not work properly, as it contained some code that was specific to the billion-size datasets, and caused some runs to fail as they searched for the wrong dataset name. I have amended that class to use a more general function, and I implemented a subclass BillionScaleDatasetCompetitionFormat that took care of the filenames for the billion-size datasets. Now the get_dataset_fn() method should work properly for every dataset.
Requesting @harsha-simhadri or @maumueller review.
The generic function
get_dataset_fn()
implemented in theDatasetCompetitionFormat
class inbenchmark/datasets.py
did not work properly, as it contained some code that was specific to the billion-size datasets, and caused some runs to fail as they searched for the wrong dataset name. I have amended that class to use a more general function, and I implemented a subclassBillionScaleDatasetCompetitionFormat
that took care of the filenames for the billion-size datasets. Now theget_dataset_fn()
method should work properly for every dataset.Requesting @harsha-simhadri or @maumueller review.