Closed yakami129 closed 4 months ago
@john-b-yang
You can see the instructions on the website and at this GitHub repository!
https://github.com/swe-bench/experiments
The Lite Oracle dataset refers to the setting where the "oracle" files, or the files that are edited by the gold patch, are provided to the model.
The Lite dataset does not have this assist. If you are running evaluation, I would encourage using the SWE-bench_Lite dataset, which contains 300 task instances.
For the submission, the instructions should be complete on the GitHub repository, but a tl;dr is we ask you to submit a folder (evaluation/lite/<date>_
logs/
folder, containing the execution logs from running SWE-bench evaluation.all_preds.jsonl
file, containing the patch generations. This should just be the file that you ran evaluation on.results/
folder, which you can autogenerate from logs/
by running the script described here.You can use this PR as a reference.
Hope this helps! Really excited by your interest in this benchmark, and definitely let me know if you need any help.
I will close this issue for now. If you need help with submission related matters, can you make an issue in the swe-bench/experiments
repo? If you SWE-bench oriented questions, feel free to re-open this or create another issue.
Thanks.
Describe the issue
We have developed a code generation tool. How can we participate in the SWE-bench leaderboard? Additionally, what are the differences between the SWE-bench_Lite_oracle and SWE-bench_Lite datasets, and how should I choose between them?
Suggest an improvement to documentation
No response