princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.81k stars 311 forks source link

Update evaluation.md evaluation argument hints #93

Closed ssh-randy closed 5 months ago

ssh-randy commented 5 months ago

What does this implement/fix? Explain your changes.

QOL update to evaluation readme, so users know they can directly use the string "test" or "dev" for the swe_bench_task arg, so that users don't need to upload any files.