Closed waterson closed 2 months ago
(Updated test.)
Actually...I'm not sure that this is valid. In particular, I think that I had some confusion between a Dataset
and a DatasetDict
. In this case I'm saving one and trying to load the other. Closing for now.
Describe the bug
get_eval_refs
returns a string instead of a list when loading a dataset that has been saved with the HFsave_to_disk
API.This means that if you try to run an eval using a dataset that you've saved-to-disk (e.g., because HF has been having a bad week), the eval runs but reports a zero solve rate because the FAIL_TO_PASS and PASS_TO_PASS entries are wrong.
Steps/Code to Reproduce
Expected Results
Actual Results
System Information
Linux, Python 3.9, SWE-bench 1.1.0.