princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.8k stars 312 forks source link

Missing `validation.ipynb`? #163

Closed xingyaoww closed 3 months ago

xingyaoww commented 3 months ago

Describe the issue

Here (https://github.com/princeton-nlp/SWE-bench/blob/68d80593ec19c55ff2f9dcd8e72632dcd8b2cc5e/swebench/collect/collection.md?plain=1#L81) describes validation.ipynb that can be used to process crawled PRs into instances - however, i cannot seem to find it across the repo? Is it renamed or put somewhere else? Thanks!!

Suggest an improvement to documentation

No response

john-b-yang commented 3 months ago

Thanks for pointing this out @xingyaoww! We focused on better evaluation with the new harness, but as a result, the validation capabilities that our original repository had have not been recovered with the new harness. Since pretty much everyone using SWE-bench has focused on evaluation, we did not make updating validation a top priority.

It shouldn't be too difficult to re-incorporate. An updated validation pipeline + new SWE-bench eval data is something we're actively working on right now, and we'll release both together in the near future! (Latest by the end of summer ~August for sure).

However, if you'd like to run validation at this time, the best option at the moment would be to check out an older version of SWE-bench and use the validation there, but given all the fixes to SWE-bench, I think that form of collection can be prone to reproducibility issues as we've seen 😅

Closing this for now, but please feel free to re-open with follow ups (+ we can always discuss more in PM too!).