Closed ramsey-coding closed 1 month ago
From my understanding environment setup is where the testbeds will check out to to install deps and base_commit is where the patch is being applied
(Not an expert on swe-bench, so take this with a grain of alt), but I think the idea with installing the deps was that you mostly default to the latest release. So environment_setup_commit
would probably point to the latest release commit before the gold patch merge and base_commit
to the main branch parent commit of the gold patch merge commit.
@klieret this doesn't seem to be true?
there are many instances in the dataset where where the environment_setup_commit
is dated after the gold patch merge.
for example, pydicom__pydicom-897
:
a96d8ba5822e1df3c9de1dbba6e3e4f1464e8605
7241f5d9db0de589b230bb84212fbb643a7c86c3
how is the environment_setup_commit
determined for each task?
Interesting. But the only point of the environment_setup_commit
is to make sure the package installs, so in principle it could be an arbitrary commit, as it is usually not related to the task itself. Since the idea was mostly to use releases for the setup commits, and any changes of the installation instructions happen at some point in between the releases, it seems reasonable that some tasks pinned the following release rather than the previous one.
But let me ping @john-b-yang @carlosejimenez who know for sure
@klieret did you get an answer?
Ah ok so the environment_setup_commit
serves the exact purpose described by @PandelisZ and @klieret.
The question that seems to remain is how this commit was actually selected, as pointed out by @nora-doe. I'll preface this by saying this is a strategy that worked for us empirically, and there's some rationale behind it, but there certainly may be better strategies.
The environment_setup_commit
corresponds to the base_commit
of the latest (a.k.a. most recent) task instance from that repo/version combination.
So as an example, if there are 10 (<- just an example, not necessarily the actual number) instances that fall under astropy/astropy
version 1.5
, the environment_setup_commit
corresponds to the base_commit
of the most recent task instance. Empirically, we found that the last instance of a repo/version tends to be a good reference for the installation requirements of all instances from that repo/version.
As a result of this, the commit referenced by environment_setup_commit
s is more recent than any other commits from that repo/version.
Code to show this:
from datasets import load_dataset
swebench = load_dataset('princeton-nlp/SWE-bench', split='test')
# Create map of each repo/version's environment_setup_commit to the creation date (`created_at`) of that commit
map_rv_to_date = {}
for inst in swebench:
if inst['base_commit'] == inst['environment_setup_commit']:
map_rv_to_date[inst['repo'] + inst['version']] = inst['created_at']
# Check that all instances' `created_at` values are less (earlier) than the corresponding creation date of the environment_setup_commit
all([
inst['created_at'] <= map_rv_to_date[inst['repo'] + inst['version']]
for inst in swebench
if inst['repo'] + inst['version'] in map_rv_to_date
])
Running this should give True
Describe the issue
I see this:
What's the difference between
environment_setup_commit
andbase_commit
?Suggest an improvement to documentation
No response