princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.81k stars 311 forks source link

Share conda environment across evals #105

Closed waterson closed 3 months ago

waterson commented 5 months ago

This change provides a path_conda to use for the eval in the testbed directory that will be reused across evaluations, and modifies the context manager's behavior so that a non-existent path_conda will be initialized and populated in the same way that a temporary context would be.

I realize that it might make some sense to have a bit of discussion about optionalizing this behavior, but I wanted to just get something out there to talk about. :)

Fixes #104.

john-b-yang commented 3 months ago

Hi @waterson thanks for the suggestion + contribution and your patience!

We have just published a major release swebench==2.0.0 today which should take care of this issue. I totally agree - the motivation behind this fix for the original code makes a lot of sense.

I left a more extensive reply to @aorwall at #109. In a nutshell, with swebench.harness.run_evaluation, it is now possible to cache [base, env, instance] images.

Put simply, if SWE-bench evaluation is run 2+ times, if the instance tier of images are cached, then environments don't need to be rebuilt at all.

Also, the env layer of images represent conda environments that multiple instances use, and this intermediate layer is what makes creating instance tier images a lot more efficient.

Thanks again for the issue - it provided the confirmation we needed to move forward with incorporating this feature, I really appreciate it a lot!