princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.47k stars 241 forks source link

Fixes to not have to reinstall testbeds and conda envs #109

Closed aorwall closed 1 week ago

aorwall commented 2 months ago

Reference Issues/PRs

Partly solves #104

What does this implement/fix? Explain your changes.

This change is to make it possible to reuse conda environments when running evaluation.

john-b-yang commented 1 week ago

@aorwall thanks so much for the original contribution and patience.

While we didn't merge the original code, we accounted for this feature in the new swebench=2.0.0 release.

The docker image caching mechanism takes care of this:

The report has more advice on how to appropriately set the cache level.

In a nutshell, this feature has been incorporated in swebench>=2.0.0. Now, with enough storage, instance-specific images can be cached, allowing 2+ evaluation runs of SWE-bench to be completed very quickly.