princeton-nlp / SWE-bench

[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?
https://www.swebench.com
MIT License
2k stars 348 forks source link

Errors building Matplotlib env instances #239

Closed martinbel closed 3 weeks ago

martinbel commented 3 weeks ago

Describe the bug

I'm getting an error building Matplotlib env instances associated to these ticket ids. I get an out of space error.

matplotlibmatplotlib-22835 matplotlibmatplotlib-24265 matplotlibmatplotlib-24970 matplotlibmatplotlib-25079 matplotlibmatplotlib-25311 matplotlibmatplotlib-25332

Error Traceback: Building environment images: 0%| | 0/7 [00:00<?, ?it/s] Building environment images: 14%|█▍ | 1/7 [01:33<09:20, 93.46s/it]Traceback (most recent call last): File "/home/runner/.cache/pypoetry/virtualenvs/cf-bench-rQ7V5b4d-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 145, in build_image raise docker.errors.BuildError( docker.errors.BuildError: write /opt/miniconda3/pkgs/cache/497deca9.json: no space left on device

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/runner/.cache/pypoetry/virtualenvs/cf-bench-rQ7V5b4d-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 314, in build_env_images future.result() File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.get_result() ^^^^^^^^^^^^^^^^^^^ File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/concurrent/futures/_base.py", line 401, in get_result raise self._exception File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/.cache/pypoetry/virtualenvs/cf-bench-rQ7V5b4d-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 151, in build_image raise BuildImageError(image_name, str(e), logger) from e swebench.harness.docker_build.BuildImageError: Error building image sweb.env.x86_64.574160a64a279afa47450f:latest: write /opt/miniconda3/pkgs/cache/497deca9.json: no space left on device Check (logs/build_images/env/sweb.env.x86_64.574160a64a279afa47450f__latest/build_image.log) for more information. --- Logging error --- Traceback (most recent call last): File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/logging/init.py", line 1163, in emit stream.write(msg + self.terminator) OSError: [Errno 28] No space left on device Call stack: File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/threading.py", line 1032, in _bootstrap self._bootstrap_inner() File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/threading.py", line 1075, in _bootstrap_inner self.run() File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/threading.py", line 1012, in run self._target(*self._args, *self._kwargs) File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/concurrent/futures/thread.py", line 92, in _worker work_item.run() File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/home/runner/.cache/pypoetry/virtualenvs/cf-bench-rQ7V5b4d-py3.12/lib/python3.12/site-packages/swebench/harness/docker_build.py", line 138, in build_image logger.info(chunk_stream.strip())

Steps/Code to Reproduce

run the harness on these instances: matplotlibmatplotlib-22835 matplotlibmatplotlib-24265 matplotlibmatplotlib-24970 matplotlibmatplotlib-25079 matplotlibmatplotlib-25311 matplotlibmatplotlib-25332

Expected Results

No errors in env instances

Actual Results

Shown above

System Information

OSX, Apple M1 Pro. I get the same results in an ubuntu instance

john-b-yang commented 3 weeks ago

Hi @martinbel. We did not test heavily on Apple Silicon. We don't plan to support it given the existing usable choices (Linux x86).

Are you sure it doesn't work on Ubuntu? I just ran it:

$ python -m swebench.harness.run_evaluation --dataset_name princeton-nlp/SWE-bench --split test --predictions_path gold --run_id check --instanc$
_ids 'matplotlib__matplotlib-22835' 'matplotlib__matplotlib-24265' 'matplotlib__matplotlib-24970' 'matplotlib__matplotlib-25079' 'matplotlib__matplotlib-25311' 'matplotlib__matplotlib-25332'
Using gold predictions - ignoring predictions_path
Running 6 unevaluated instances...
Base image sweb.mm.base.x86_64:latest already exists, skipping build.
Base images built successfully.
Total environment images to build: 3
3 ran successfully, 0 failed: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 53.30it/s]
All environment images built successfully.
Running 6 instances...
6 ran successfully, 0 failed: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [10:37<00:00, 106.18s/it]
All instances run.
Cleaning cached images...
Removed 0 images.
Total instances: 6
Instances submitted: 6
Instances completed: 6
Instances incomplete: 0
Instances resolved: 6
Instances unresolved: 0
Instances with empty patches: 0
Instances with errors: 0
Unstopped containers: 0
Unremoved images: 0
Report written to gold.check.json
(sweb) john-b-yang@bitbop:~/swe-bench/multimodal$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Everything built and the gold patches resolve.

SmartManoj commented 3 weeks ago

no space left on device

storage issue

john-b-yang commented 3 weeks ago

Oh yeah you're right @SmartManoj. @martinbel I'm closing this issue, as the explanation seems quite apparent now as pointed out by @SmartManoj.

If you are caching the instance images (--cache_level, perhaps turn that off. Or, if you are setting the run to be multiple workers at a time, you can reduce the number. If you don't care about speed, but want correctness, I'd recommend (1) running evaluation serially and (2) running on a Linux machine.