How to run evaluation with swe-bench_lite?

pi-null-mezon commented 7 months ago

I have managed to run evaluation according to the tutorial with swe-bench.json. But with the lite version of swe-bench it does not work because benchmark files files for lite version are stored in *.arrow format instead of *.json. So, how to run evaluation for swe-bench_lite?

pi-null-mezon commented 7 months ago

I have found the solution:

Step 1

mv ./inference
python run_api.py --dataset_name_or_path princeton-nlp/SWE-bench_Lite_oracle --model_name_or_path claude-3-haiku-20240307 --output_dir ../_datasets_/output/lite/haiku

Step 2

mv ./swebench/harness
python run_evaluation.py --predictions_path ../../_datasets_/output/lite/haiku/claude-3-haiku-20240307__SWE-bench_Lite_oracle__test.jsonl --swe_bench_tasks ../../_datasets_/swe-bench.json --log_dir ../../_datasets_/logs --testbed ../../_datasets_/workdir --verbose

thought at Step 2 errors related to miniconda appears:

2024-04-04 15:38:09,122 - testbed_context_manager - INFO - [Testbed] Using conda path /home/alex/Programming//SWE-bench/_datasets_/workdir/claude-3-haiku-20240307/psf__requests/2.3/tmpcmt6e7pt/miniconda3
Error: Command '['/home/alex/Programming/SWE-bench/_datasets_/workdir/claude-3-haiku-20240307/psf__requests/2.3/tmpcmt6e7pt/miniconda3/bin/conda', 'env', 'list']' returned non-zero exit status 1.
Error stdout: 
Error stderr: Traceback (most recent call last):
  File "/home/alex/Programming//SWE-bench/_datasets_/workdir/claude-3-haiku-20240307/psf__requests/2.3/tmpcmt6e7pt/miniconda3/bin/conda", line 12, in <module>
    from conda.cli import main
ModuleNotFoundError: No module named 'conda'

vishwa27yvs commented 7 months ago

Hi, I had the same question, @pi-null-mezon , how did you find the swe-bench.json file for swe-bench-lite?

pi-null-mezon commented 7 months ago

You do not need this json. Just call evaluation with princeton-nlp/SWE-bench_Lite_oracle. It will be downloaded from hugginface.

vishwa27yvs commented 7 months ago

Okay got it, thank you!

john-b-yang commented 6 months ago

@pi-null-mezon apologies this wasn't more clear in the documentation, thanks so much for providing.

I will update the documentation now to indicate that huggingface SWE-bench datasets can be provided as an argument.

Feel free to re-open if there are any follow up questions!

princeton-nlp / SWE-bench

How to run evaluation with swe-bench_lite? #73