princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.47k stars 241 forks source link

improve eval performance by caching per-repo/version conda environments #104

Closed waterson closed 1 week ago

waterson commented 2 months ago

Describe the feature

Right now running an eval (e.g., using the SWE-agent evaluation/evaluation.py script) runs in such a way that a temporary conda environment is created each time you run an eval. It seems like the conda environments could be created once per repo/version, and then reused again and again across different evaluations.

Potential Solutions

One way to do this (for which I'll attach a PR) is to simply configure a reaonable path_conda in the eval args; e.g.,

args.path_conda = os.path.join(testbed, "conda", repo.rsplit('__', 1)[-1], version)
thisdotmatt commented 2 weeks ago

I believe Auto Code Rover has an implementation of this in which they group non-redundant conda environments and cache them.