princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.81k stars 311 forks source link

Sharing my gpt-4-bm25-27k results #75

Closed roywei closed 5 months ago

roywei commented 6 months ago

hi, I ran inference on half of the problems using gpt-4 (bm25-27k setting), sharing my results here if this can be useful. Awesome project! https://drive.google.com/file/d/1Q9AX4zgOKDLMrXlKvBJ4e5qJtbtPW3aP/view?usp=sharing

john-b-yang commented 5 months ago

Hi @roywei, thanks so much for the contribution, we really appreciate it!!

We're about to release execution logs + predictions for all models we've evaluated on SWE-bench so far (ETA tomorrow). I'll close issue for now, but we will be sure to add your results to the website as well. I'll post here with an update when we do!