Open Hk669 opened 1 month ago
Thanks for the report and the already verbose information. Could you also paste the command you ran for evaluation?
(my suspicion is that you might be specifying the wrong input file, just because I know that this happened to me before...)
Thanks for the report and the already verbose information. Could you also paste the command you ran for evaluation?
the evaluation command:
./run_eval.sh ../trajectories/hrushi669/azure-gpt-3.5-turbo-1106__SWE-agent__test-repo__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
(my suspicion is that you might be specifying the wrong input file, just because I know that this happened to me before...)
i gave it the correct path of the input file (all_preds.jsonl
) as mentioned in the above evaluation command. can you please help me out on this.
and also an issue, even after miniconda is installed in my pc.
(venv) (base) hrushi669@Hrushikesh:/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation$ ./run_eval.sh ../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
Found 8 total predictions, will evaluate 3 (5 are empty)
🏃 Beginning evaluation...
2024-06-01 17:03:59,954 - run_evaluation - INFO - Found 3 predictions across 1 model(s) in predictions file
2024-06-01 17:03:59,963 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/django__django/4.1] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:03:59,978 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/django__django/3.0] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:03:59,994 - run_evaluation - INFO - [azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/sphinx-doc__sphinx/3.5] # of predictions to evaluate: 1 (0 already evaluated)
2024-06-01 17:04:00,075 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_django_3.0.log
2024-06-01 17:04:00,075 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_django_4.1.log
2024-06-01 17:04:00,076 - testbed - INFO - Created log file /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/testbed_sphinx_3.5.log
2024-06-01 17:04:00,079 - testbed - INFO - Repo django/django: 1 versions
2024-06-01 17:04:00,079 - testbed - INFO - Repo django/django: 1 versions
2024-06-01 17:04:00,080 - testbed - INFO - Repo sphinx-doc/sphinx: 1 versions
2024-06-01 17:04:00,082 - testbed - INFO - Version 4.1: 1 instances
2024-06-01 17:04:00,082 - testbed - INFO - Version 3.0: 1 instances
2024-06-01 17:04:00,084 - testbed - INFO - Version 3.5: 1 instances
2024-06-01 17:04:00,092 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7
2024-06-01 17:04:00,094 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf
2024-06-01 17:04:00,095 - testbed - INFO - Using conda path /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0
2024-06-01 17:04:00,106 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpc4yi_9nb for testbed
2024-06-01 17:04:00,107 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0eb_sfio for testbed
2024-06-01 17:04:00,108 - testbed - INFO - Using working directory /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmpx0a2ox00 for testbed
2024-06-01 17:04:00,118 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3...
2024-06-01 17:04:00,119 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3...
2024-06-01 17:04:00,119 - testbed - INFO - No conda path provided, creating temporary install in /mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3...
2024-06-01 17:04:00,124 - testbed - INFO - django/3.0 instances in a single process
2024-06-01 17:04:00,125 - testbed - INFO - sphinx/3.5 instances in a single process
2024-06-01 17:04:00,125 - testbed - INFO - django/4.1 instances in a single process
2024-06-01 17:04:00,128 - testbed - INFO - django/3.0 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:04:00,128 - testbed - INFO - sphinx/3.5 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:04:00,129 - testbed - INFO - django/4.1 using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-06-01 17:10:29,623 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,627 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,634 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3
Unpacking payload ...
0%| | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 0%| | 0/69 [01:01<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 1%|▏ | 1/69 [01:01<1:09:58, 61.74s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 1%|▏ | 1/69 [02:13<1:09:58, 61.74s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 3%|▎ | 2/69 [02:13<1:15:25, 67.54s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda: 3%|▎ | 2/69 [02:13<1:15:25, 67.54s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda: 4%|▍ | 3/69 [02:13<1:14:17, 67.54s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda: 6%|▌ | 4/69 [02:13<1:13:10, 67.54s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda: 7%|▋ | 5/69 [02:13<1:12:02, 67.54s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda: 9%|▊ | 6/69 [02:13<1:10:55, 67.54s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda: 10%|█ | 7/69 [02:13<1:09:47, 67.54s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda: 12%|█▏ | 8/69 [02:13<1:08:39, 67.54s/it]
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 609, in result_iterator
File "concurrent/futures/_base.py", line 446, in result
File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21327] Failed to execute script 'entry_point' due to unhandled exception!
2024-06-01 17:10:29,639 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3
Unpacking payload ...
0%| | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 0%| | 0/69 [01:09<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 1%|▏ | 1/69 [01:09<1:18:49, 69.56s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 1%|▏ | 1/69 [02:37<1:18:49, 69.56s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 3%|▎ | 2/69 [02:37<1:29:34, 80.21s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda: 3%|▎ | 2/69 [02:37<1:29:34, 80.21s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda: 4%|▍ | 3/69 [02:37<1:28:13, 80.21s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda: 6%|▌ | 4/69 [02:37<1:26:53, 80.21s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda: 7%|▋ | 5/69 [02:37<1:25:33, 80.21s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda: 9%|▊ | 6/69 [02:37<1:24:13, 80.21s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda: 10%|█ | 7/69 [02:37<1:22:53, 80.21s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda: 12%|█▏ | 8/69 [02:37<1:21:32, 80.21s/it]
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 609, in result_iterator
File "concurrent/futures/_base.py", line 446, in result
File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21337] Failed to execute script 'entry_point' due to unhandled exception!
2024-06-01 17:10:29,659 - testbed - ERROR - Error: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,669 - testbed - ERROR - Error stdout: PREFIX=/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3
Unpacking payload ...
0%| | 0/69 [00:00<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 0%| | 0/69 [01:03<?, ?it/s]
Extracting : archspec-0.2.1-pyhd3eb1b0_0.conda: 1%|▏ | 1/69 [01:03<1:11:27, 63.04s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 1%|▏ | 1/69 [02:14<1:11:27, 63.04s/it]
Extracting : boltons-23.0.0-py39h06a4308_0.conda: 3%|▎ | 2/69 [02:14<1:16:00, 68.07s/it]
Extracting : brotli-python-1.0.9-py39h6a678d5_7.conda: 3%|▎ | 2/69 [02:14<1:16:00, 68.07s/it]
Extracting : bzip2-1.0.8-h7b6447c_0.conda: 4%|▍ | 3/69 [02:14<1:14:52, 68.07s/it]
Extracting : c-ares-1.19.1-h5eee18b_0.conda: 6%|▌ | 4/69 [02:14<1:13:44, 68.07s/it]
Extracting : ca-certificates-2023.08.22-h06a4308_0.conda: 7%|▋ | 5/69 [02:14<1:12:36, 68.07s/it]
Extracting : certifi-2023.7.22-py39h06a4308_0.conda: 9%|▊ | 6/69 [02:14<1:11:28, 68.07s/it]
Extracting : cffi-1.15.1-py39h5eee18b_3.conda: 10%|█ | 7/69 [02:14<1:10:20, 68.07s/it]
Extracting : charset-normalizer-2.0.4-pyhd3eb1b0_0.conda: 12%|█▏ | 8/69 [02:14<1:09:12, 68.07s/it]
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
File "multiprocessing/connection.py", line 256, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 562, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 609, in result_iterator
File "concurrent/futures/_base.py", line 446, in result
File "concurrent/futures/_base.py", line 391, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
[21328] Failed to execute script 'entry_point' due to unhandled exception!
2024-06-01 17:10:29,725 - testbed - ERROR - Error traceback: Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,728 - testbed - ERROR - Error traceback: Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/4.1/tmpu2up60g0/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
2024-06-01 17:10:29,734 - testbed - ERROR - Error traceback: Traceback (most recent call last):
File "/mnt/c/Users/hrush/OneDrive - Student Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in __call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3/miniconda.sh', '-b', '-u', '-p', '/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/sphinx/3.5/tmp0ges26wf/miniconda3', '&&', 'conda', 'init', '--all']' returned non-zero exit status 1.
❌ Evaluation failed: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',
'conda', 'init', '--all']' returned non-zero exit status 1.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
^^^^^^^^^^^^^^^^
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/engine_evaluation.py", line 177, in main
setup_testbed(data_groups[0])
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/engine_validation.py", line 91, in
setup_testbed
with TestbedContextManager(
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 285, in
__enter__
self.exec(install_cmd)
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 95, in
__call__
raise e
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/context_manager.py", line 82, in
__call__
output = subprocess.run(cmd, **combined_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',
'conda', 'init', '--all']' returned non-zero exit status 1.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/evaluation.py", line 72, in main
run_evaluation(
File "/mnt/c/Users/hrush/OneDrive - Student
Ambassadors/Desktop/AutoSwe/venv/lib/python3.12/site-packages/swebench/harness/run_evaluation.py", line 203, in main
pool.map(eval_engine, eval_args)
File "/usr/lib/python3.12/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/pool.py", line 774, in get
raise self._value
subprocess.CalledProcessError: Command '['bash',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3/miniconda
.sh', '-b', '-u', '-p',
'/mnt/c/Users/hrush/AutoSWE/AutoSwe/SWE-agent/evaluation/testbed/a09d900d21/django/3.0/tmphk0bcuv7/miniconda3', '&&',
'conda', 'init', '--all']' returned non-zero exit status 1.
==================================
Log directory for evaluation run:
results/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1
- Wrote per-instance scorecards to
../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/scorecard
s.json
Reference Report:
- no_generation: 5
- generated: 3
- with_logs: 0
- install_fail: 0
- reset_failed: 0
- no_apply: 0
- applied: 0
- test_errored: 0
- test_timeout: 0
- resolved: 0
- Wrote summary of run to
../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/results.j
son
i have this error haunting me since 3 hours, can anyone help @klieret
Regarding your initial report: I believe you need to specify the dataset name or path as second argument (the default is princeton-nlp/SWE-bench
, which is probably not what you need here).
Let me ping @carlosejimenez and @john-b-yang for this issue.
Hi @Hk669 Just to be clear, I noticed that you used a "test-repo" in your examples. I'm not sure if that's just a placeholder, but generally the evaluation process will only work with examples already in the SWE-bench dataset. This is because we have the correct tests and behavior logged for those instances only.
However, the other command you mentioned: ./run_eval.sh ../trajectories/hrushi66/azure-gpt-3.5-turbo-1106__SWE-bench_Lite__default__t-0.00__p-0.95__c-3.00__install-1/all_preds.jsonl
does seem like an issue. Can you confirm the version of swebench that you're using? Can you make sure to use the latest version of the repository/package?
Hi I am having a similar issue, I do not know where to get the SWE-bench-lite dataset. Here is what I'm running:
python run_evaluation.py \ --predictions_path "../../../SWE-agent/trajectories/vscode/gpt4__SWE-bench_Litedefaultt-0.00p-0.95c-2.00__install-1/all_preds.jsonl" \ --swe_bench_tasks "SWE-bench_Lite/data/dev-00000-of-00001.parquet" \ --log_dir "logs" \ --testbed "testbed" \ --skip_existing \ --timeout 900 \ --verbose
I also tried to convert the data to a .json file with a python script, but that also did not work. Could you help direct me to the dataset? I ran SWE-agent with the regular command. python run.py --model_name gpt4 \ --per_instance_cost_limit 2.00 \ --config_file ./config/default.yaml
Describe the bug
when i had to reproduce the logs as mentioned in the Benchmarking , the swe-agent created a patch but when evaluating it. it produce the below error.
results.json
Steps/commands/code to Reproduce
as mentioned in the Benchmarking , but with the model azure:gpt-3.5-turbo-1106
Error message/results
System Information
ubuntu
Checklist