princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.8k stars 312 forks source link

How can one participate in the SWE-bench leaderboard? #121

Closed yakami129 closed 4 months ago

yakami129 commented 4 months ago

Describe the issue

We have developed a code generation tool. How can we participate in the SWE-bench leaderboard? Additionally, what are the differences between the SWE-bench_Lite_oracle and SWE-bench_Lite datasets, and how should I choose between them?

Suggest an improvement to documentation

No response

yakami129 commented 4 months ago

@john-b-yang

john-b-yang commented 4 months ago

You can see the instructions on the website and at this GitHub repository!

https://github.com/swe-bench/experiments

The Lite Oracle dataset refers to the setting where the "oracle" files, or the files that are edited by the gold patch, are provided to the model.

The Lite dataset does not have this assist. If you are running evaluation, I would encourage using the SWE-bench_Lite dataset, which contains 300 task instances.

For the submission, the instructions should be complete on the GitHub repository, but a tl;dr is we ask you to submit a folder (evaluation/lite/<date>_`) that contains:

You can use this PR as a reference.

Hope this helps! Really excited by your interest in this benchmark, and definitely let me know if you need any help.

I will close this issue for now. If you need help with submission related matters, can you make an issue in the swe-bench/experiments repo? If you SWE-bench oriented questions, feel free to re-open this or create another issue.

yakami129 commented 4 months ago

Thanks.