princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.45k stars 240 forks source link

Update reporting and skip empty model patch predictions #168

Closed carlosejimenez closed 2 days ago

carlosejimenez commented 2 days ago

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Updates the make_run_report function in run_evalution to include statistics for submitted ids and empty model patch predictions. Also, skips container setup for predictions with empty model patches.

Any other comments?

john-b-yang commented 2 days ago

LGTM! 😄