princeton-nlp / SWE-bench

[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
2k stars 348 forks source link

Pin tox packages for sphinx images #205

Closed aorwall closed 3 months ago

aorwall commented 3 months ago

When I rebuilt the sphinx images to test all SWE-Bench Verified instances all newly built images started to fail. Seems to be because of a new version of tox. This PR is to pin the version in the previously built images. Which seems to work.

Compare this benchmark run with failing sphinx instances: https://eval.moatless.ai/evaluations/fcbb473f957149e7a5a45baeaa11800c

With this fix it looks like this: https://eval.moatless.ai/evaluations/ab7a61b78e334a759fff3b344d30bd70

carlosejimenez commented 3 months ago

I reproduced this and your changes work! Thanks so much!

ofirpress commented 3 months ago

Thanks Albert!