`model_name_or_path` is None when running models without adapters, causing an error in `run_evaluation.py`

rucnyz commented 1 month ago

Describe the bug

In the file inference/run_llama.py, lines 299-304 contain the following code:

res = {
    "instance_id": instance["instance_id"],
    "full_output": output,
    "model_patch": diff,
    "model_name_or_path": peft_path,
}

The model_name_or_path is set to peft_path. If we run the code with a model that doesn't use an adapter (for example, princeton-nlp/SWE-Llama-13b), the model_name_or_path in the stored results will be None. This causes an error in run_evaluation.py at line 134:

if len(testbed_model_name) > 50:

Output:

    if len(testbed_model_name) > 50:
       ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

Steps/Code to Reproduce

Run inference/run_llama.py using a model without an adapter, such as princeton-nlp/SWE-Llama-13b.

cd inference
python run_llama.py --dataset_path princeton-nlp/SWE-bench_oracle --model_name_or_path princeton-nlp/SWE-Llama-13b --output_dir ./outputs --temperature 0

Then run run_evaluation.py and observe the error at line 134.

cd ../swebench/harness
python run_evaluation.py --swe_bench_tasks princeton-nlp/SWE-bench_oracle --log_dir ./logs --testbed ./testbed --predictions_path ../../inference/outputs/princeton-nlp__SWE-bench_oracle__test__princeton-nlp__SWE-Llama-13b__temp-0.0__top-p-1.0.jsonl

Expected Results

No error is thrown

Actual Results

    if len(testbed_model_name) > 50:
       ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

System Information

swebench = 1.1.5

john-b-yang commented 2 weeks ago

Tagging @carlosejimenez to help with this

john-b-yang commented 2 weeks ago

Commit dee44dcd41e1a69d222d2661a069be1e7061c112 should fix this. I've added model_name_or_path as an argument and that should make it so that the field is never a NoneType.

Thank you so much for the extremely detailed report + discussion, it made the fix very easy to write. Really appreciate it! 😄

princeton-nlp / SWE-bench