Hi! Thanks for all the work, after the 04/15 patch I can now reproduce most of the SWE-bench instances using the default harness. However, I'm still having trouble with (at least) Flask and Scikit-Learn, where environments fail to be initialized bc of what I suspect is a Cython version mismatch. This fails even in a clean-slate Docker environment (example attached).
However, in your Repair Report (https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240415_eval_bug/README.md) you mentioned you have successfully reproduced evaluation of the whole dataset. So either I'm doing something uniquely wrong, or the process still depends on the host environment and the environment you're using is unique in some way. I'd like to figure out which one of these is the case :) It would be great if you could share more operational details about your test running process - the environment, the exact scripts or ideally even a Docker image that does it.
Thanks!
Repro of my failing attempt to set up the harness for scikit-learn:
Test script:
from swebench.harness.context_manager import TestbedContextManager
from swebench.metrics.getters import get_eval_refs
if name == "main":
insts = get_eval_refs("princeton-nlp/SWE-bench")
# only take scikit-learn for repro
insts = {k: v for (k, v) in insts.items() if v["repo"].endswith("scikit-learn")}
# simply create the context manager
# Note: leaving both `conda_link` and `path_conda` empty to use the default logic, whatever it is
tcm = TestbedContextManager(
list(insts.values()),
"/tmp/swebench_logs",
testbed=str("/tmp/swebench_eval_dir/testbed"),
)
# just enter it and print all tasks
with tcm:
distributed_task_list = tcm.get_distributed_tasks()
for task_list in distributed_task_list:
print(
f"{task_list['testbed']}: {len(task_list['task_instances'])} instances"
)
- Dockerfile:
```docker
FROM continuumio/miniconda3
WORKDIR /workdir
RUN git clone https://github.com/princeton-nlp/SWE-bench /workdir
RUN conda env create -f environment.yml
RUN echo "conda activate swe-bench" >> ~/.bashrc
# pre-cache the SWE-bench HF dataset to avoid re-downloading it every time
RUN conda run -n swe-bench python -c 'from swebench.metrics.getters import get_eval_refs; get_eval_refs("princeton-nlp/SWE-bench")'
COPY test_script.py test_script.py
Output with error:
```
2024-05-01 23:47:04,375 - testbed - INFO - [Testbed] Creating log directory /tmp/swebench_logs
2024-05-01 23:47:04,377 - testbed - INFO - Created log file /tmp/swebench_logs/testbed_7b951baf60cfc66ce769aded8436917c3debe1c42faefbe9.log
2024-05-01 23:47:04,377 - testbed - INFO - Repo scikit-learn/scikit-learn: 5 versions
2024-05-01 23:47:04,377 - testbed - INFO - Version 1.4: 2 instances
2024-05-01 23:47:04,377 - testbed - INFO - Version 1.3: 35 instances
2024-05-01 23:47:04,377 - testbed - INFO - Version 0.22: 72 instances
2024-05-01 23:47:04,377 - testbed - INFO - Version 0.21: 60 instances
2024-05-01 23:47:04,377 - testbed - INFO - Version 0.20: 60 instances
2024-05-01 23:47:04,377 - testbed - INFO - Using conda path /tmp/tmpy54mprlj
2024-05-01 23:47:04,377 - testbed - INFO - Using working directory /tmp/swebench_eval_dir/testbed for testbed
2024-05-01 23:47:04,378 - testbed - INFO - No conda path provided, creating temporary install in /tmp/tmpy54mprlj/miniconda3...
2024-05-01 23:47:04,378 - testbed - INFO - Multiple repos/versions; using Miniconda link: https://repo.anaconda.com/miniconda/Miniconda3-py39_23.10.0-1
2024-05-01 23:47:11,464 - testbed - INFO - Using conda path /tmp/tmpy54mprlj/miniconda3
2024-05-01 23:47:11,856 - testbed - INFO - Setting up testbed for scikit-learn__scikit-learn__0.20
2024-05-01 23:47:41,034 - testbed - INFO - Cloned scikit-learn/scikit-learn to /tmp/swebench_eval_dir/testbed/scikit-learn__scikit-learn__0.20
2024-05-01 23:47:41,034 - testbed - INFO - Creating environment scikit-learn__scikit-learn__0.20
2024-05-01 23:48:05,490 - testbed - INFO - Installing pip packages for scikit-learn__scikit-learn__0.20; Command: . /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learn__scikit-learn__0.20 && pip install numpy==1.19.2 scipy==1.5.2
2024-05-01 23:48:09,345 - testbed - ERROR - Error: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learn__scikit-learn__0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.
2024-05-01 23:48:09,345 - testbed - ERROR - Error stdout: Collecting numpy==1.19.2
Downloading numpy-1.19.2.zip (7.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 39.7 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [54 lines of output]
Running from numpy source directory.
setup.py:470: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
run_build = parse_setuppy_commands()
Error compiling Cython file:
------------------------------------------------------------
...
cdef sfc64_state rng_state
def __init__(self, seed=None):
BitGenerator.__init__(self, seed)
self._bitgen.state = &self.rng_state
self._bitgen.next_uint64 = &sfc64_uint64
^
------------------------------------------------------------
_sfc64.pyx:90:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to the type of the value being assigned.
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_sfc64.pyx
Traceback (most recent call last):
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 235, in
main()
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 231, in main
find_process_files(root_dir)
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 222, in find_process_files
process(root_dir, fromfile, tofile, function, hash_db)
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 188, in process
processor_function(fromfile, tofile)
File "/tmp/pip-install-xgdcfda2/numpy_aa0333d333154dcb80e15351c222fe81/tools/cythonize.py", line 77, in process_pyx
subprocess.check_call(
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/tmp/tmpy54mprlj/miniconda3/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_sfc64.c', '_sfc64.pyx']' returned non-zero exit status 1.
Cythonizing sources
Traceback (most recent call last):
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/tmp/tmpy54mprlj/miniconda3/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 157, in prepare_metadata_for_build_wheel
self.run_setup()
File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 248, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-s9yvqxy1/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 142, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 499, in
setup_package()
File "setup.py", line 479, in setup_package
generate_cython()
File "setup.py", line 274, in generate_cython
raise RuntimeError("Running cythonize failed!")
RuntimeError: Running cythonize failed!
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
2024-05-01 23:48:09,346 - testbed - ERROR - Error traceback: Traceback (most recent call last):
File "/workdir/swebench/harness/context_manager.py", line 82, in call
output = subprocess.run(cmd, **combined_args)
File "/opt/conda/envs/swe-bench/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learnscikit-learn0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.
Traceback (most recent call last):
File "/workdir/test_script.py", line 18, in
with tcm:
File "/workdir/swebench/harness/context_manager.py", line 403, in enter
self.exec(cmd, shell=True)
File "/workdir/swebench/harness/context_manager.py", line 95, in call
raise e
File "/workdir/swebench/harness/context_manager.py", line 82, in call
output = subprocess.run(cmd, **combined_args)
File "/opt/conda/envs/swe-bench/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learnscikit-learn0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.
</details>
### Potential Solutions
Would it be possible for you to include a full command/script that, when run on a clean environment, will set up each instance and confirm that the golden solution correctly solves it?
Describe the feature
Hi! Thanks for all the work, after the 04/15 patch I can now reproduce most of the SWE-bench instances using the default harness. However, I'm still having trouble with (at least) Flask and Scikit-Learn, where environments fail to be initialized bc of what I suspect is a Cython version mismatch. This fails even in a clean-slate Docker environment (example attached).
However, in your Repair Report (https://github.com/princeton-nlp/SWE-bench/blob/main/docs/20240415_eval_bug/README.md) you mentioned you have successfully reproduced evaluation of the whole dataset. So either I'm doing something uniquely wrong, or the process still depends on the host environment and the environment you're using is unique in some way. I'd like to figure out which one of these is the case :) It would be great if you could share more operational details about your test running process - the environment, the exact scripts or ideally even a Docker image that does it.
Thanks!
Repro of my failing attempt to set up the harness for
scikit-learn
:if name == "main": insts = get_eval_refs("princeton-nlp/SWE-bench")
docker run $(docker build --quiet .) bash -c ". activate swe-bench && python test_script.py"
Output with error: (collapsed below)
× Encountered error while generating package metadata. ╰─> See above for output.
note: This is an issue with the package mentioned above, not pip. hint: See above for details.
2024-05-01 23:48:09,346 - testbed - ERROR - Error traceback: Traceback (most recent call last): File "/workdir/swebench/harness/context_manager.py", line 82, in call output = subprocess.run(cmd, **combined_args) File "/opt/conda/envs/swe-bench/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learnscikit-learn0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.
Traceback (most recent call last): File "/workdir/test_script.py", line 18, in
with tcm:
File "/workdir/swebench/harness/context_manager.py", line 403, in enter
self.exec(cmd, shell=True)
File "/workdir/swebench/harness/context_manager.py", line 95, in call
raise e
File "/workdir/swebench/harness/context_manager.py", line 82, in call
output = subprocess.run(cmd, **combined_args)
File "/opt/conda/envs/swe-bench/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '. /tmp/tmpy54mprlj/miniconda3/bin/activate scikit-learnscikit-learn0.20 && pip install numpy==1.19.2 scipy==1.5.2' returned non-zero exit status 1.