princeton-nlp / SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
1.47k stars 241 forks source link

`/bin/sh: pytest: command not found` when running evaluations #96

Closed psykhi closed 2 weeks ago

psykhi commented 2 months ago

Describe the bug

Running evaluations on 1 issue of the swe bench lite devset pydicom__pydicom-1256 and looking at the logs this is what I see:

>>>>> Applied Patch (test)
Test Script: . /tmp/sweet/testbed/sweet/pydicom/2.1/tmpydsw8v9j/miniconda3/bin/activate pydicom__pydicom__2.1 && echo 'activate successful' && pytest --no-header -rA --tb=no -p no:cacheprovider pydicom/tests/test_json.py;
[pydicom__pydicom__2.1] [pydicom__pydicom-1256] Command: . /tmp/sweet/testbed/sweet/pydicom/2.1/tmpydsw8v9j/miniconda3/bin/activate pydicom__pydicom__2.1 && echo 'activate successful' && pytest --no-header -rA --tb=no -p no:cacheprovider pydicom/tests/test_json.py
[pydicom__pydicom__2.1] [pydicom__pydicom-1256] Subprocess args: {"check": false, "shell": true, "capture_output": false, "text": true, "env": {"CONDA_PKGS_DIRS": "/tmp/sweet/testbed/sweet/pydicom/2.1/tmpydsw8v9j/miniconda3/cache"}, "stdout": -1, "stderr": -2, "timeout": 900}
[pydicom__pydicom__2.1] [pydicom__pydicom-1256] Std. Output:
activate successful
/bin/sh: pytest: command not found

[pydicom__pydicom__2.1] [pydicom__pydicom-1256] Return Code: 127

>>>>> Some Tests Failed
[pydicom__pydicom__2.1] [pydicom__pydicom-1256] Test script run successful

Shouldn't pytest be present in the repo environment?

Steps/Code to Reproduce

PYTHONPATH=/path/SWE-bench:$PYTHONPATH ./run_evaluation.sh

run_evaluations.sh

#!/bin/bash
python run_evaluation.py \
    --predictions_path "/path/predictions.json" \
    --swe_bench_tasks "princeton-nlp/SWE-bench_Lite" \
    --log_dir "/path/logs" \
    --testbed "/path/testbed" \
    --skip_existing \
    --timeout 900 \
    --verbose

Using pydicom__pydicom-1256. Fresh clone of the repo and fresh conda install.

Expected Results

I would expect pytest to be in the repo venv.

Actual Results

Applied Patch (test) Test Script: . /tmp/sweet/testbed/sweet/pydicom/2.1/tmpydsw8v9j/miniconda3/bin/activate pydicompydicom2.1 && echo 'activate successful' && pytest --no-header -rA --tb=no -p no:cacheprovider pydicom/tests/test_json.py; [pydicompydicom2.1] [pydicompydicom-1256] Command: . /tmp/sweet/testbed/sweet/pydicom/2.1/tmpydsw8v9j/miniconda3/bin/activate pydicompydicom2.1 && echo 'activate successful' && pytest --no-header -rA --tb=no -p no:cacheprovider pydicom/tests/test_json.py [pydicompydicom2.1] [pydicom__pydicom-1256] Subprocess args: {"check": false, "shell": true, "capture_output": false, "text": true, "env": {"CONDA_PKGS_DIRS": "/tmp/sweet/testbed/sweet/pydicom/2.1/tmpydsw8v9j/miniconda3/cache"}, "stdout": -1, "stderr": -2, "timeout": 900} [pydicompydicom2.1] [pydicompydicom-1256] Std. Output: activate successful /bin/sh: pytest: command not found

[pydicompydicom2.1] [pydicom__pydicom-1256] Return Code: 127

Some Tests Failed [pydicompydicom2.1] [pydicom__pydicom-1256] Test script run successful

System Information

MacOS ARM Fresh conda, fresh clone

psykhi commented 2 months ago

My dirty workaround to get further has been to edit context_manager.py

self.cmd_activate = (
            f". {os.path.join(self.conda_path, 'bin', 'activate')} "
            + f"{self.venv} && echo 'activate successful' && pip install pytest"
        )
john-b-yang commented 2 months ago

The installation specifications for Pydicom are fully included here.

Based on our validation of the pydicom__pydicom-1256 instance, as shown here, it didn't seem like it was necessary for pytest to be explicitly installed.

The base commit for the instance is here. I believe that the pip install -e . instruction specified here should have taken care of installing pytest automatically, but I will double check this.

john-b-yang commented 2 weeks ago

Marking this as completed, hope the original comment helped with this!

I checked the pydicom installations, the validation logs are under the validation/ folder in SWE-bench/experiments. I was able to confirm that the explicit pytest installation should not be necessary. With that said, it doesn't hurt to install it again, so I merged #134 which should take care of this.

We are also going to come out with a new SWE-bench evaluation harness within the next 2 weeks, which incorporates Docker containers into the evaluation process. It should resolve a lot of the inconsistencies that are arising from running SWE-bench evaluation on different machines. If you're still interested in working on SWE-bench, definitely look out for the release! 😄