princeton-nlp / SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
https://princeton-nlp.github.io/SWE-agent/
MIT License
11.89k stars 1.19k forks source link

Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1 #471

Open Hk669 opened 1 month ago

Hk669 commented 1 month ago

Describe the bug

when i had to reproduce the logs as mentioned in the Benchmarking , the swe-agent created a patch but when evaluating it. it produce the below error.

(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
🏃 Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
❌ Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
    run_evaluation(
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main   
    t = tasks_map[p[KEY_INSTANCE_ID]]
        ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'

==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json
image

results.json

{
  "no_generation": [],
  "generated": [
    "SWE-agent__test-repo-i1"
  ],
  "with_logs": [],
  "install_fail": [],
  "reset_failed": [],
  "no_apply": [],
  "applied": [],
  "test_errored": [],
  "test_timeout": [],
  "resolved": []
}

Steps/commands/code to Reproduce

as mentioned in the Benchmarking , but with the model azure:gpt-3.5-turbo-1106

Error message/results

(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 1 total predictions, will evaluate 1 (0 are empty)
🏃 Beginning evaluation...
2024-05-31 21:33:56,635 - run_evaluation - WARNING - Predictions for the following instance_ids were not found in the tasks file and will not be considered: SWE-agent__test-repo-i1
2024-05-31 21:33:56,640 - run_evaluation - INFO - Found 1 predictions across 1 model(s) in predictions file
❌ Evaluation failed: 'SWE-agent__test-repo-i1'
Traceback (most recent call last):
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
    run_evaluation(
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 122, in main   
    t = tasks_map[p[KEY_INSTANCE_ID]]
        ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 'SWE-agent__test-repo-i1'

==================================
Log directory for evaluation run: results/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/scorecards.json
Reference Report:
- no_generation: 0
- generated: 1
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/results.json

System Information

ubuntu

Checklist

klieret commented 1 month ago

Thanks for the report and the already verbose information. Could you also paste the command you ran for evaluation?

klieret commented 1 month ago

(my suspicion is that you might be specifying the wrong input file, just because I know that this happened to me before...)

Hk669 commented 1 month ago

Thanks for the report and the already verbose information. Could you also paste the command you ran for evaluation?

the evaluation command:

./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Hk669 commented 1 month ago

(my suspicion is that you might be specifying the wrong input file, just because I know that this happened to me before...)

i gave it the correct path of the input file (all_preds.jsonl) as mentioned in the above evaluation command. can you please help me out on this.

Hk669 commented 1 month ago

and also an issue, even after miniconda is installed in my pc.

(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl       
Found 8 total predictions, will evaluate 3 (5 are empty)
🏃 Beginning evaluation...
2024-06-01 17:03:59,954 - run_evaluation - INFO - Found 3 predictions across 1 model(s) in predictions file
2024-06-01 17:03:59,963 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/django__django/4.1] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:03:59,978 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/django__django/3.0] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:03:59,994 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/sphinx-doc__sphinx/3.5] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:04:00,075 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_django_3.0.log       
2024-06-01 17:04:00,075 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_django_4.1.log       
2024-06-01 17:04:00,076 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_sphinx_3.5.log       
2024-06-01 17:04:00,079 - testbed - INFO - Repo django/django: 1 versions
2024-06-01 17:04:00,079 - testbed - INFO - Repo django/django: 1 versions
2024-06-01 17:04:00,080 - testbed - INFO - Repo sphinx-doc/sphinx: 1 versions
2024-06-01 17:04:00,082 - testbed - INFO -      Version 4.1: 1 instances
2024-06-01 17:04:00,082 - testbed - INFO -      Version 3.0: 1 instances
2024-06-01 17:04:00,084 - testbed - INFO -      Version 3.5: 1 instances
2024-06-01 17:04:00,092 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7
2024-06-01 17:04:00,094 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf
2024-06-01 17:04:00,095 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0
2024-06-01 17:04:00,106 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpc4yi_9nb for testbed
2024-06-01 17:04:00,107 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0eb_sfio for testbed
2024-06-01 17:04:00,108 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmpx0a2ox00 for testbed
2024-06-01 17:04:00,118 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3...
2024-06-01 17:04:00,119 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3...
2024-06-01 17:04:00,119 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3...
2024-06-01 17:04:00,124 - testbed - INFO - django/3.0 instances in a single process
2024-06-01 17:04:00,125 - testbed - INFO - sphinx/3.5 instances in a single process
2024-06-01 17:04:00,125 - testbed - INFO - django/4.1 instances in a single process
2024-06-01 17:04:00,128 - testbed - INFO - django/3.0 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:04:00,128 - testbed - INFO - sphinx/3.5 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:04:00,129 - testbed - INFO - django/4.1 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1

2024-06-01 17:10:29,623 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,627 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,634 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3
Unpacking payload ...

  0%|          | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda:   0%|          | 0/69 [01:01<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda:   1%|▏         | 1/69 [01:01<1:09:58, 61.74s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda:   1%|▏         | 1/69 [02:13<1:09:58, 61.74s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda:   3%|▎         | 2/69 [02:13<1:15:25, 67.54s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda:   3%|▎         | 2/69 [02:13<1:15:25, 67.54s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda:   4%|▍         | 3/69 [02:13<1:14:17, 67.54s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda:   6%|▌         | 4/69 [02:13<1:13:10, 67.54s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda:   7%|▋         | 5/69 [02:13<1:12:02, 67.54s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda:   9%|▊         | 6/69 [02:13<1:10:55, 67.54s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda:  10%|█         | 7/69 [02:13<1:09:47, 67.54s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda:  12%|█▏        | 8/69 [02:13<1:08:39, 67.54s/it]

concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
  File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
  File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "entry_point.py", line 69, in <module>
  File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
  File "concurrent/futures/_base.py", line 609, in result_iterator
  File "concurrent/futures/_base.py", line 446, in result
  File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21327] Failed to execute script 'entry_point' due to unhandled exception!

2024-06-01 17:10:29,639 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3
Unpacking payload ...

  0%|          | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda:   0%|          | 0/69 [01:09<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda:   1%|▏         | 1/69 [01:09<1:18:49, 69.56s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda:   1%|▏         | 1/69 [02:37<1:18:49, 69.56s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda:   3%|▎         | 2/69 [02:37<1:29:34, 80.21s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda:   3%|▎         | 2/69 [02:37<1:29:34, 80.21s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda:   4%|▍         | 3/69 [02:37<1:28:13, 80.21s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda:   6%|▌         | 4/69 [02:37<1:26:53, 80.21s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda:   7%|▋         | 5/69 [02:37<1:25:33, 80.21s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda:   9%|▊         | 6/69 [02:37<1:24:13, 80.21s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda:  10%|█         | 7/69 [02:37<1:22:53, 80.21s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda:  12%|█▏        | 8/69 [02:37<1:21:32, 80.21s/it]

concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
  File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
  File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "entry_point.py", line 69, in <module>
  File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
  File "concurrent/futures/_base.py", line 609, in result_iterator
  File "concurrent/futures/_base.py", line 446, in result
  File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21337] Failed to execute script 'entry_point' due to unhandled exception!

2024-06-01 17:10:29,659 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,669 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3
Unpacking payload ...

  0%|          | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda:   0%|          | 0/69 [01:03<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda:   1%|▏         | 1/69 [01:03<1:11:27, 63.04s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda:   1%|▏         | 1/69 [02:14<1:11:27, 63.04s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda:   3%|▎         | 2/69 [02:14<1:16:00, 68.07s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda:   3%|▎         | 2/69 [02:14<1:16:00, 68.07s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda:   4%|▍         | 3/69 [02:14<1:14:52, 68.07s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda:   6%|▌         | 4/69 [02:14<1:13:44, 68.07s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda:   7%|▋         | 5/69 [02:14<1:12:36, 68.07s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda:   9%|▊         | 6/69 [02:14<1:11:28, 68.07s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda:  10%|█         | 7/69 [02:14<1:10:20, 68.07s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda:  12%|█▏        | 8/69 [02:14<1:09:12, 68.07s/it]

concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
  File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
  File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "entry_point.py", line 69, in <module>
  File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
  File "concurrent/futures/_base.py", line 609, in result_iterator
  File "concurrent/futures/_base.py", line 446, in result
  File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21328] Failed to execute script 'entry_point' due to unhandled exception!

2024-06-01 17:10:29,725 - testbed - ERROR - Error traceback: Traceback (most recent call last):
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
    output = subprocess.run(cmd, **combined_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.

2024-06-01 17:10:29,728 - testbed - ERROR - Error traceback: Traceback (most recent call last):
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
    output = subprocess.run(cmd, **combined_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.

2024-06-01 17:10:29,734 - testbed - ERROR - Error traceback: Traceback (most recent call last):
  File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
    output = subprocess.run(cmd, **combined_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.

❌ Evaluation failed: Command '['bash', 
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',  
'conda', 'init', '--all']' returned non-zero exit status 1.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
           ^^^^^^^^^^^^^^^^
  File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/engine_evaluation.py", line 177, in main
    setup_testbed(data_groups[0])
  File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/engine_validation.py", line 91, in      
setup_testbed
    with TestbedContextManager(
  File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 285, in       
__enter__
    self.exec(install_cmd)
  File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 95, in        
__call__
    raise e
  File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in        
__call__
    output = subprocess.run(cmd, **combined_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',  
'conda', 'init', '--all']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
    run_evaluation(
  File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 203, in main   
    pool.map(eval_engine, eval_args)
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 774, in get
    raise self._value
subprocess.CalledProcessError: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',  
'conda', 'init', '--all']' returned non-zero exit status 1.

==================================
Log directory for evaluation run: 
results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to 
../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/scorecard
s.json
Reference Report:
- no_generation: 5
- generated: 3
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to
../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/results.j
son

i have this error haunting me since 3 hours, can anyone help @klieret

klieret commented 1 month ago

Regarding your initial report: I believe you need to specify the dataset name or path as second argument (the default is princeton-nlp/SWE-bench, which is probably not what you need here).

Let me ping @carlosejimenez and @john-b-yang for this issue.

carlosejimenez commented 1 month ago

Hi @Hk669 Just to be clear, I noticed that you used a "test-repo" in your examples. I'm not sure if that's just a placeholder, but generally the evaluation process will only work with examples already in the SWE-bench dataset. This is because we have the correct tests and behavior logged for those instances only.

However, the other command you mentioned: ./run_eval.sh ../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl does seem like an issue. Can you confirm the version of swebench that you're using? Can you make sure to use the latest version of the repository/package?

ivan4722 commented 4 weeks ago

Hi I am having a similar issue, I do not know where to get the SWE-bench-lite dataset. Here is what I'm running:

!/bin/bash

python run_evaluation.py \ --predictions_path "../../../SWE-agent/trajectories/vscode/gpt4__SWE-bench_Litedefaultt-0.00p-0.95c-2.00__install-1/all_preds.jsonl" \ --swe_bench_tasks "SWE-bench_Lite/data/dev-00000-of-00001.parquet" \ --log_dir "logs" \ --testbed "testbed" \ --skip_existing \ --timeout 900 \ --verbose

I also tried to convert the data to a .json file with a python script, but that also did not work. Could you help direct me to the dataset? I ran SWE-agent with the regular command. python run.py --model_name gpt4 \ --per_instance_cost_limit 2.00 \ --config_file ./config/default.yaml