princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.8k stars 312 forks source link

Cannot load dataset from JSON file #150

Closed klieret closed 3 months ago

klieret commented 3 months ago

Describe the bug

When loading a dataset by name, PASS_TO_PASS will be a JSON encoded string. When loading it from disk, it will be a python object. However, this distinction is not made in the code:

def make_test_spec(instance: SWEbenchInstance) -> TestSpec:
    ...
    pass_to_pass = json.loads(instance["PASS_TO_PASS"])
    fail_to_pass = json.loads(instance["FAIL_TO_PASS"])

Steps/Code to Reproduce

python -m swebench.harness.run_evaluation \                                                                                                                                                                  
        --predictions_path ../trajectories/klieret/azure-gpt4__dev23__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl \
        --max_workers 1 \
        --run_id test \
        --dataset_name ../data/dev23.json

Expected Results

.

Actual Results

2024-06-28 20:50:41,239 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-06-28 20:50:41,417 - datasets - INFO - PyTorch version 2.2.1 available.
2024-06-28 20:50:41,418 - datasets - INFO - JAX version 0.4.25 available.
<frozen runpy>:128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour
Running 23 unevaluated instances...
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/klieret/SWE-bench/swebench/harness/run_evaluation.py", line 529, in <module>
    main(**vars(args))
  File "/home/klieret/SWE-bench/swebench/harness/run_evaluation.py", line 492, in main
    build_env_images(client, dataset, force_rebuild, max_workers)
  File "/home/klieret/SWE-bench/swebench/harness/docker_build.py", line 280, in build_env_images
    build_base_images(client, dataset, force_rebuild)
  File "/home/klieret/SWE-bench/swebench/harness/docker_build.py", line 171, in build_base_images
    test_specs = get_test_specs_from_dataset(dataset)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/klieret/SWE-bench/swebench/harness/test_spec.py", line 113, in get_test_specs_from_dataset
    return list(map(make_test_spec, dataset))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/klieret/SWE-bench/swebench/harness/test_spec.py", line 266, in make_test_spec
    pass_to_pass = json.loads(instance["PASS_TO_PASS"])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/bitbop/lib/python3.12/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not list

System Information

No response