Reproducer Docker image

Describe the feature

Hi! Thanks for all the work, after the 04/15 patch I can now reproduce most of the SWE-bench instances using the default harness. However, I'm still having trouble with (at least) Flask and Scikit-Learn, where environments fail to be initialized bc of what I suspect is a Cython version mismatch. This fails even in a clean-slate Docker environment (example attached).

However, in your Repair Report (https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240415_eval_bug/README.md) you mentioned you have successfully reproduced evaluation of the whole dataset. So either I'm doing something uniquely wrong, or the process still depends on the host environment and the environment you're using is unique in some way. I'd like to figure out which one of these is the case :) It would be great if you could share more operational details about your test running process - the environment, the exact scripts or ideally even a Docker image that does it.

Thanks!

Repro of my failing attempt to set up the harness for scikit-learn:

Test script:


from swebench.harness.context_manager import TestbedContextManager
from swebench.metrics.getters import get_eval_refs

if name == "main": insts = get_eval_refs("princeton-nlp/SWE-bench")

# only take scikit-learn for repro
insts = {k: v for (k, v) in insts.items() if v["repo"].endswith("scikit-learn")}

# simply create the context manager 
# Note: leaving both `conda_link` and `path_conda` empty to use the default logic, whatever it is 
tcm = TestbedContextManager(
    list(insts.values()),
    "/tmp/swebench_logs",
    testbed=str("/tmp/swebench_eval_dir/testbed"),
)

# just enter it and print all tasks
with tcm:
    distributed_task_list = tcm.get_distributed_tasks()
    for task_list in distributed_task_list:
        print(
            f"{task_list['testbed']}: {len(task_list['task_instances'])} instances"
        )


- Dockerfile:
```docker
FROM continuumio/miniconda3
WORKDIR /workdir

RUN git clone https://github.com/princeton-nlp/SWE-bench /workdir
RUN conda env create -f environment.yml
RUN echo "conda activate swe-bench" >> ~/.bashrc

# pre-cache the SWE-bench HF dataset to avoid re-downloading it every time
RUN conda run -n swe-bench python -c 'from swebench.metrics.getters import get_eval_refs; get_eval_refs("princeton-nlp/SWE-bench")'

COPY test_script.py test_script.py

Command: docker run $(docker build --quiet .) bash -c ". activate swe-bench && python test_script.py"
Output with error: (collapsed below)

Output with error: ``` 2024-05-01 23:47:04,375 - testbed - INFO - [Testbed] Creating log directory /tmp/swebench_logs 2024-05-01 23:47:04,377 - testbed - INFO - Created log file /tmp/swebench_logs/testbed_7b951baf60cfc66ce769aded8436917c3debe1c42faefbe9.log 2024-05-01 23:47:04,377 - testbed - INFO - Repo scikit-learn/scikit-learn: 5 versions 2024-05-01 23:47:04,377 - testbed - INFO - Version 1.4: 2 instances 2024-05-01 23:47:04,377 - testbed - INFO - Version 1.3: 35 instances 2024-05-01 23:47:04,377 - testbed - INFO - Version 0.22: 72 instances 2024-05-01 23:47:04,377 - testbed - INFO - Version 0.21: 60 instances 2024-05-01 23:47:04,377 - testbed - INFO - Version 0.20: 60 instances 2024-05-01 23:47:04,377 - testbed - INFO - Using conda path /tmp/tmpy54mprlj 2024-05-01 23:47:04,377 - testbed - INFO - Using working directory /tmp/swebench_eval_dir/testbed for testbed 2024-05-01 23:47:04,378 - testbed - INFO - No conda path provided, creating temporary install in /tmp/tmpy54mprlj/miniconda3... 2024-05-01 23:47:04,378 - testbed - INFO - Multiple repos/versions; using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1 2024-05-01 23:47:11,464 - testbed - INFO - Using conda path /tmp/tmpy54mprlj/miniconda3 2024-05-01 23:47:11,856 - testbed - INFO - Setting up testbed for scikit-learn__scikit-learn__0.20 2024-05-01 23:47:41,034 - testbed - INFO - Cloned scikit-learn/scikit-learn to /tmp/swebench_eval_dir/testbed/scikit-learn__scikit-learn__0.20 2024-05-01 23:47:41,034 - testbed - INFO - Creating environment scikit-learn__scikit-learn__0.20 2024-05-01 23:48:05,490 - testbed - INFO - Installing pip packages for scikit-learn__scikit-learn__0.20; Command: . /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learn__scikit-learn__0.20 && pip install numpy==1.19.2 scipy==1.5.2 2024-05-01 23:48:09,345 - testbed - ERROR - Error: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learn__scikit-learn__0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1. 2024-05-01 23:48:09,345 - testbed - ERROR - Error stdout: Collecting numpy==1.19.2 Downloading numpy-1.19.2.zip (7.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 39.7 MB/s eta 0:00:00 Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Preparing metadata (pyproject.toml): finished with status 'error' error: subprocess-exited-with-error × Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [54 lines of output] Running from numpy source directory. setup.py:470: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates run_build = parse_setuppy_commands() Error compiling Cython file: ------------------------------------------------------------ ... cdef sfc64_state rng_state def __init__(self, seed=None): BitGenerator.__init__(self, seed) self._bitgen.state = &self.rng_state self._bitgen.next_uint64 = &sfc64_uint64 ^ ------------------------------------------------------------ _sfc64.pyx:90:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to the type of the value being assigned. Processing numpy/random/_bounded_integers.pxd.in Processing numpy/random/_sfc64.pyx Traceback (most recent call last): File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 235, in main() File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 231, in main find_process_files(root_dir) File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 222, in find_process_files process(root_dir, fromfile, tofile, function, hash_db) File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 188, in process processor_function(fromfile, tofile) File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 77, in process_pyx subprocess.check_call( File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/tmp/tmpy54mprlj/miniconda3/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_sfc64.c', '_sfc64.pyx']' returned non-zero exit status 1. Cythonizing sources Traceback (most recent call last): File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel return hook(metadata_directory, config_settings) File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 157, in prepare_metadata_for_build_wheel self.run_setup() File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 248, in run_setup super(_BuildMetaLegacyBackend, File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 142, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 499, in setup_package() File "setup.py", line 479, in setup_package generate_cython() File "setup.py", line 274, in generate_cython raise RuntimeError("Running cythonize failed!") RuntimeError: Running cythonize failed! [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

2024-05-01 23:48:09,346 - testbed - ERROR - Error traceback: Traceback (most recent call last): File "/workdir/swebench/harness/context_manager.py", line 82, in call output = subprocess.run(cmd, **combined_args) File "/opt/conda/envs/swe-bench/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learnscikit-learn0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.

Traceback (most recent call last): File "/workdir/test_script.py", line 18, in with tcm: File "/workdir/swebench/harness/context_manager.py", line 403, in enter self.exec(cmd, shell=True) File "/workdir/swebench/harness/context_manager.py", line 95, in call raise e File "/workdir/swebench/harness/context_manager.py", line 82, in call output = subprocess.run(cmd, **combined_args) File "/opt/conda/envs/swe-bench/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learnscikit-learn0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.


</details>

### Potential Solutions

Would it be possible for you to include a full command/script that, when run on a clean environment, will set up each instance and confirm that the golden solution correctly solves it?

princeton-nlp / SWE-bench

Reproducer Docker image #113

Describe the feature