How can one participate in the SWE-bench leaderboard?

yakami129 commented 4 months ago

Describe the issue

We have developed a code generation tool. How can we participate in the SWE-bench leaderboard? Additionally, what are the differences between the SWE-bench_Lite_oracle and SWE-bench_Lite datasets, and how should I choose between them?

Suggest an improvement to documentation

No response

yakami129 commented 4 months ago

@john-b-yang

john-b-yang commented 4 months ago

You can see the instructions on the website and at this GitHub repository!

https://github.com/swe-bench/experiments

The Lite Oracle dataset refers to the setting where the "oracle" files, or the files that are edited by the gold patch, are provided to the model.

The Lite dataset does not have this assist. If you are running evaluation, I would encourage using the SWE-bench_Lite dataset, which contains 300 task instances.

For the submission, the instructions should be complete on the GitHub repository, but a tl;dr is we ask you to submit a folder (evaluation/lite/<date>_`) that contains:

A logs/ folder, containing the execution logs from running SWE-bench evaluation.
An all_preds.jsonl file, containing the patch generations. This should just be the file that you ran evaluation on.
If you'd like, you can also provide a results/ folder, which you can autogenerate from logs/ by running the script described here.

You can use this PR as a reference.

Hope this helps! Really excited by your interest in this benchmark, and definitely let me know if you need any help.

I will close this issue for now. If you need help with submission related matters, can you make an issue in the swe-bench/experiments repo? If you SWE-bench oriented questions, feel free to re-open this or create another issue.

yakami129 commented 4 months ago

Thanks.

princeton-nlp / SWE-bench

How can one participate in the SWE-bench leaderboard? #121

Describe the issue

Suggest an improvement to documentation